Tabula PDF

What is this tool?

Tabula PDF is a tool for extracting tables from PDFs and converting them into reusable formats. It is useful when you want to use tables from PDFs published as government documents, survey reports, handouts, and similar materials as data.

You can specify the area of the table to extract on the page, and export it as CSV / JSON / Excel format. It helps you create a starting point for data processing in situations where “you have the PDF but not the original data.”

Features

Table extraction from PDF (page-by-page and multi-page support)
Manual selection of table regions (drag) with extraction preview
Switching extraction modes (rule-based / whitespace-based)
Batch output of multiple tables
Download in CSV / JSON / Excel formats

How to use

1. Upload a PDF file
1. Open the target page and select the table area to extract
1. Choose the extraction mode and review the preview
1. If everything looks correct, export as CSV / JSON / Excel

Data formats

Input: PDF
Output: CSV, JSON, Excel

Notes

This tool is intended for use with PDFs that contain text data.
For PDFs consisting primarily of scanned images, running OCR beforehand will improve extraction accuracy.