Industry Article

A machine learning pipeline for document extraction

Each year the geoscience industry creates huge volumes of documents containing a wealth of knowledge which cannot be easily queried or extracted. Key to the successful extraction and transformation of data is an understanding of the nature of the data that exists within a corpus of files. For large datasets, it is time-consuming to manually open and review each document in turn. Therefore, in this article, we discuss how machine learning is used at CGG to classify documents in our automated pipeline and reduce project times significantly.

Download Resource

Publications

First Break.

Authors

Chin Hang Lun, Thomas Hewitt, Song Hou

Month

February

A machine learning pipeline for document extraction

Publications

Authors

Month

Copyright