Machine Learning to Support Technical Document Indexing, a Case Study on Seismic Acquisition Reports
H. Blondelle, P. Neri and J. Micaelli
Event name: 80th EAGE Conference and Exhibition 2018
Session: Poster: Data information management and High-performance computing
Publication date: 11 June 2018
Info: Extended abstract, PDF ( 932.33Kb )
Price: € 20
From the drill floor to the top floor, all exploration decisions are based on data. Today, industry standards formats proposed by the SEG or Energistics are structured and facilitate the transfer and archiving of the measurements, together with associated metadata. The xml formats proposed by Energistics such as WITSML™ also make it possible to stream the information in support of real-time decisions. Nevertheless, to have a full understanding of the context of a survey, geoscientists still have to go to the acquisition reports. These reports are available in PDF or TIFF unstructured formats which are very difficult to index automatically at a large scale. Various attempts to apply some deterministic data mining approaches have been disappointing due to the high variability of reports formats and layout styles. In order to illustrate the potential of machine learning systems to index automatically subsurface related documents, we have built a learning models to detect 20 metadata items among seismic acquisition, QAQC, HSE and navigation reports. This has confirmed the capacity of ML to index on demand large volumes of documents. This also opens the possibility to extract data from unstructured documents prior to applying classical modelling or data analytic.