New Feature: Automatic Table Detection
Our latest update introduces intelligent table detection, allowing you to extract structured tabular data from complex PDFs with a single click. Let's dive into how it works.
John Smith
Co-Founder / CTO
For months, one of the most requested features has been the ability to reliably extract data from tables. Invoices, financial reports, and inventory lists often contain critical information locked away in tabular format. Today, we're thrilled to announce this is now possible.
The Challenge with Tables
Extracting tables from PDFs is notoriously difficult. Unlike a spreadsheet, a PDF doesn't have a defined structure of rows and columns. It's just a collection of text and lines placed at specific coordinates. Our challenge was to build an AI model that could visually understand these layouts like a human does.
Our goal was to make table extraction feel like magic. The user shouldn't have to worry about the complexity behind the scenes.
How It Works
Our new model uses a combination of computer vision and natural language processing. Here’s a simplified breakdown:
- Visual Analysis: The model first identifies visual cues like lines, borders, and whitespace to detect the grid structure of a table.
- Text Clustering: It then groups nearby text into cells and determines their row and column relationships.
- Header Identification: Finally, it analyzes the content to identify header rows, ensuring the data is correctly labeled when exported.
The result is a clean, structured output (like CSV or JSON ) that you can use immediately, without any manual re-typing.
What's Next?
This is just the beginning. We're already working on improving support for merged cells, multi-page tables, and even tables without any visible borders. Your feedback is crucial, so please give the new feature a try and let us know what you think at support@rask.co.rw!