For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GuidesAPI ReferenceChangelogModel Versioning
GuidesAPI ReferenceChangelogModel Versioning
    • Getting Started
      • Overview
      • Creating Evaluation Sets
      • Running Evaluation Sets
      • Calculating Array Accuracy
LogoLogo
On this page
  • Overview
  • Array Accuracy Calculation
  • Handling Row Mismatches
  • Example: Array Row Order Mismatch
  • Row Pairing Process
  • Accuracy Calculation with Mismatches
  • Example Calculation
  • Real-World Example
  • Key Points
Evaluation Sets

Calculating array accuracy

Was this page helpful?
Previous

Composer

Next
Built with

Overview

Accuracy calculation for data extraction involves comparing extracted data against expected data, taking into account both the content and structure of the data. This process is relatively straightforward for scalar values, but more complex for arrays due to potential mismatches in row ordering and count.

Array Accuracy Calculation

For array data (tables, lists, etc.), accuracy is calculated as:

Accuracy = (Number of Correct Cells / max(Total Expected Cells, Total Extracted Cells)) × 100%

The denominator is the greater of:

  • Total number of expected cells
  • Total number of extracted cells

This ensures that both over-extraction and under-extraction are properly penalized in the accuracy calculation.

Handling Row Mismatches

The accuracy calculation becomes more complex when there are differences between the extracted and expected array structure. Consider the following scenarios:

Example: Array Row Order Mismatch

Extracted Array:

Row 1: {a, b, c}
Row 2: {d, e, f}

Expected Array:

Row 1: {d, e, f}
Row 2: {a, b, c}
Row 3: {g, h, i}

In this case:

  1. The algorithm intelligently pairs extracted rows with expected rows
  2. Missing or extra rows are counted as incorrect cells
  3. The total number of cells is based on the expected structure

Row Pairing Process

The algorithm creates the following pairings:

  1. Extracted Row 1 ↔ Expected Row 2
  2. Extracted Row 2 ↔ Expected Row 1
  3. null ↔ Expected Row 3

Accuracy Calculation with Mismatches

When there are row count mismatches:

  • The total number of cells used in the denominator is the greater of:
    • Number of expected rows × number of columns
    • Number of extracted rows × number of columns
  • Missing or extra rows count as incorrect cells
  • This penalizes both over-extraction and under-extraction

Example Calculation

Using the previous example:

  • Expected cells: 9 (3 rows × 3 columns)
  • Extracted cells: 6 (2 rows × 3 columns)
  • Denominator: max(9, 6) = 9
  • Correctly extracted cells: 6
  • Accuracy: (6/9) × 100% = 66.66%

Real-World Example

Consider a case where:

  • 39 rows were extracted
  • 50 rows were expected
  • Each row has 8 columns
  • 3 cells were incorrect in the matched rows

The calculation would be:

  • Expected cells: 50 × 8 = 400
  • Extracted cells: 39 × 8 = 312
  • Denominator: max(400, 312) = 400
  • Correctly extracted cells: 309
  • Accuracy: (309/400) × 100% = 77.25%

This lower accuracy reflects both:

  1. The 3 incorrect cells in matched rows
  2. The 11 missing rows (which count as incorrect cells)

Key Points

  • This accuracy calculation method is specifically designed for array data
  • Accuracy is calculated based on the greater of expected or extracted cells
  • Array row order mismatches are handled through intelligent pairing
  • Missing or extra array rows are penalized in the accuracy calculation
  • The denominator is always the larger of the expected or extracted cell count