DocExtract

Public PDF data extraction

Extract structured data from public PDFs with source-linked evidence

DocExtract helps analysts, researchers, journalists, and public-data users turn public PDF reports into structured data while keeping every answer tied to source evidence.

Use public, non-confidential documents only. Do not upload private citizen records, sensitive government files, personal data, confidential case files, or regulated documents.

Built for the documents you actually work with.

Government reports

FOI or FOIA document releases

Public policy PDFs

Public audit reports

Council or agency meeting packs

Public consultation documents

Open-data reports trapped in PDF form

Define the columns once, then review every value.

These are typical fields people define for this kind of document. Define your own in plain language, approve the instruction pack, and verify each value against its source page.

Document titleAgency or publisherPublication dateProgram nameRegionMetricValueFindingRecommendationPage numberSource quote

From upload to source review in one continuous flow.

Keep setup, approval, and evidence review inside one product instead of spreading the workflow across separate tools.

01

Upload the document set

Start with one PDF or a clean batch for the same job.

02

Define the fields that matter

Set columns, hints, and the output mode in plain language.

03

Approve the instruction pack

Review and edit the generated rules before the run starts.

04

Review evidence, not just rows

Jump from a result cell to the supporting PDF evidence.

Why use DocExtract for public PDFs?

Copy-paste loses context; DocExtract keeps source evidence attached.

Spreadsheet cleanup is slow; DocExtract starts with the fields you care about.

PDF chat gives answers; DocExtract creates structured tables.

Source-linked review helps public-data users verify claims before publishing or sharing.

Common questions before you start.

What kinds of public PDFs work best?

Public reports, policy documents, published tables, agency PDFs, FOI releases, and open-data PDFs.

Can I use this for confidential public-sector work?

No. Use DocExtract only for public, non-confidential, low-sensitivity documents.

Can I review the extracted results?

Yes. Every extracted value links back to its source page, quote, and highlight inside DocExtract so you can verify each one before using it elsewhere.

Public, non-confidential PDFs only

Use public, non-confidential documents only. Do not upload private citizen records, sensitive government files, personal data, confidential case files, or regulated documents.

See our upload policy for details.

See all questions →

Other ways to use DocExtract.

Start extracting structured data from your PDFs today.

Subscribe to any plan and get 14 days free to test live extraction with your own documents. No charge until the trial ends.