Question 1

Which workflow should I use?

Accepted Answer

Use Extraction when you want named fields back in a table from one or many PDFs, such as findings, dates, metrics, or conclusions.

Use Gap analysis when you want differences between two whole documents, such as future-vs-current requirements or version-to-version changes.

Question 2

How should I set up my first extraction?

Accepted Answer

Keep the first run small. Start with 2-3 fields you can verify quickly, then expand once the output looks right.

Use representative PDFs for the first run rather than the biggest batch. The goal is to validate the setup, not maximize volume on the first attempt.

Question 3

How do I describe my document set?

Accepted Answer

Data type tells the extractor what kind of records the PDFs represent, such as safety reports, endpoint tables, study summaries, or requirement documents.

Write it the way you would describe the set to another person. This helps instruction generation and relational template inference stay on the right track.

Question 4

How should I use Document profile?

Accepted Answer

Leave Document profile on Auto for the first run. It is the safest default when you are still learning how the job behaves.

Use Structured Numeric for row-heavy or amount-heavy PDFs like invoices, contracts, or estimate tables. Use Research for journals, studies, and narrative evidence extraction.

Question 5

How should I choose columns for the first run?

Accepted Answer

Columns define the exact fields you want returned in the output table.

For the first run, choose fields that are easy to spot and easy to verify. Add interpretation hints only when a field name could be read more than one way.

Question 6

When should I use Relational mode?

Accepted Answer

Use Relational mode when one PDF contains repeated linked entities, such as multiple devices, endpoints, cohorts, or adverse events.

Relational output can contain multiple records per PDF, and each record carries the same required field keys.

Relational mode is currently in Beta — the record-template inference is still being refined; flag any unexpected groupings so we can tune it.

Question 7

When should I use Flat mode?

Accepted Answer

Use Flat mode when you need one consolidated row per PDF. This is best for document-level summaries where repeated entities do not need separate rows.

Question 8

When should I care about the Record template (DSL)?

Accepted Answer

Most users can rely on the auto-generated "What we’ll extract" summary shown in the new-job form — it describes the same template in plain English.

The DSL only matters when you are doing relational extraction and need precise control over how repeated entities are grouped. Open the advanced editor in the form to access it. `record` is the entity anchor, `fields` lists the required field keys, and `cardinality` sets whether you expect one record (`single`) or many (`multi`).

Question 9

When should I add Rules?

Accepted Answer

Rules are optional setup instructions that influence how values are interpreted and normalized.

Keep them concrete. For example: use endpoint table first, normalize percentages, return Not found when unsure.

Question 10

What am I checking during instruction approval?

Accepted Answer

Check three things: the objective matches the document set, the field plans match the outputs you expect, and the missing-value behavior is strict enough.

You do not need to rewrite everything. If something is off, revise with one narrow sentence such as "Prefer endpoint table wording" or "Treat score as numeric plus unit when present."

Question 11

How do I verify a result?

Accepted Answer

Click a value in the results table to open the evidence view. Review the quote, the page number, and the highlight before trusting the value.

If the document does not support the result, use the feedback actions to flag it or suggest a correction.

Question 12

What files are allowed?

Accepted Answer

Upload PDF files only. Extraction jobs accept up to 10 PDFs per run, and per-file size limits are enforced at upload. Do not upload confidential or sensitive documents.

PDF	Record	Device name	Manufacturer	Risk class
trial.pdf	device-1	CardioSense Monitor	ACME Med	Class IIb
trial.pdf	device-2	NeuroTrack Sensor	ACME Med	Class III

PDF	Invoice #	Total	Due date
invoice_apr.pdf	INV-1042	$4,820.00	2026-06-01
invoice_may.pdf	INV-1078	$1,200.00	2026-07-01
invoice_jun.pdf	INV-1095	$9,150.50	2026-07-15

FAQ

Browse by topic