What is this tool?
OpenRefine is a tool for cleaning (deduplication, inconsistency correction, formatting) and transforming tabular data in the browser. It can process large volumes of data in batch, significantly reducing the need for manual corrections.
It can save the state of your cleansing work on the server side, and also record cleansing procedures for reapplication to similar files.
Features
- Filtering and faceting for narrowing down and reviewing values
- Column splitting/merging, value replacement, whitespace and symbol normalization
- Duplicate detection and clustering for inconsistency correction
- Batch transformation using expressions (GREL)
- Reconciliation with external data sources
How to use
- Load data in CSV / TSV / Excel / JSON or other formats
- Use Facets and filters to identify problematic values
- Clean up data using transformations, replacements, and clustering
- Export in the desired format
Data formats
- Input: CSV, TSV, Excel (xls/xlsx), Google Sheets, JSON, XML, OpenDocument, etc.
- Output: CSV, TSV, Excel, JSON, etc.


