Aspose.OCR Table to Text for .NET

Aspose.OCR Table to Text for .NET is a powerful plugin that enables developers to extract text from scanned or photographed tables with high accuracy. Leveraging advanced machine learning algorithms and neural networks, it detects table structures, pulls cell-level text, and organizes everything into searchable, editable spreadsheets or tabular data structures.

Installation and Setup

To get started, install the Aspose.OCR Table to Text package via NuGet or download the assembly from Aspose servers. See the Installation guide for detailed steps. Enable full functionality by configuring metered licensing as described in the Metered Licensing documentation.

Features and Functionalities

Table Detection and Structure Recognition

  • Automatically detects table boundaries in scanned or photographed images, even if cells are skewed, rotated, or unevenly lit.
  • Supports multi-row and multi-column layouts, nested tables, and varying cell sizes.
  • Provides a hierarchical representation of rows and cells for simplified post-processing.

Cell Text Extraction

  • Recognizes text within each cell using advanced OCR algorithms, preserving line breaks, capitalization, and numeric formatting.
  • Handles multiple languages in a single table with configurable language priorities.
  • Corrects distortions such as skew, low contrast, or image noise to boost accuracy.

Table Reconstruction and Export

  • Reconstructs detected tables into .NET data structures (e.g., DataTable) or exports them into CSV/TSV formats.
  • Generates editable spreadsheet files (XLSX) that can be opened in Excel or other tools.
  • Retains basic cell formatting (alignment, borders) and exports coordinates for advanced workflows.

Searchable and Editable Output

  • Produces searchable text layers in PDF exports, making table content indexable.
  • Integrates seamlessly with Aspose.Cells for advanced spreadsheet operations like formulas and charting.
  • Allows embedding extracted content into databases or downstream processing pipelines.

Performance and Scalability

  • Optimized for batch processing of large datasets with configurable threading and memory management.
  • Streams image data directly to the OCR engine, minimizing disk I/O.
  • Provides progress callbacks and cancellation tokens for long-running operations.

Advanced Customization

  • Region-of-interest (ROI) support to limit detection to specific areas for faster processing.
  • Configurable confidence thresholds to filter low-confidence results.
  • Hooks for pre- and post-processing (custom filters, deskew algorithms, or validators).

Example: Extract Text from Scanned or Photographed Tables

Aspose.OCR.Metered metered = new Aspose.OCR.Metered();
metered.SetMeteredKey("PublicKey", "PrivateKey");

// Initialize recognition engine
Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();

// Add images to OcrInput object
Aspose.OCR.OcrInput input = new Aspose.OCR.OcrInput(Aspose.OCR.InputType.SingleImage);
input.Add("source1.png");
input.Add("source2.jpg");

// Configure recognition settings for tables
Aspose.OCR.RecognitionSettings recognitionSettings = new Aspose.OCR.RecognitionSettings();
recognitionSettings.DetectAreasMode = DetectAreasMode.TABLE;

// Recognize table text
List<Aspose.OCR.RecognitionResult> results = recognitionEngine.Recognize(input, recognitionSettings);
foreach (Aspose.OCR.RecognitionResult result in results)
{
    Console.WriteLine(result.RecognitionText);
}

// Save results
results[0].Save("result.txt", Aspose.OCR.SaveFormat.Text);
Aspose.OCR.AsposeOcr.SaveMultipageDocument("result.pdf", Aspose.OCR.SaveFormat.Pdf, results);

Common Use Cases

  • Extracting structured data from financial reports.
  • Converting scanned forms and applications into spreadsheets.
  • Automating data entry tasks by transforming table images into editable formats.

Tips and Best Practices

  • Use images with at least 300 DPI and good contrast for best results.
  • Pre-crop or deskew images to isolate table regions.
  • Load only necessary language packs to reduce memory use.
  • Tune confidence thresholds to balance precision and recall.
  • Validate reconstructed tables against expected schemas before importing to databases.

By following these guidelines and leveraging its table recognition capabilities, Aspose.OCR Table to Text for .NET provides developers with a reliable solution for converting scanned tables into structured, editable, and searchable text.