Data extraction

Extract data contained in a PDF

Sometime data are contained in a PDF, which makes difficult to extract them in a csv file. Luckily, some tools exist to ease this process.

Plain text

Tables

  • Tabula: it requires you to do it “by hand”, but so far it is the most accurate tool I’ve used. Perfectly suited for a small number of tables to extract and/or if you have a lot of time 🙂
  • Tabulizer: this R package allows for a pure R implementation, but the extractions were really messy

Leave a Reply

Your email address will not be published. Required fields are marked *