DocumentAnalyzerSchema

DocumentAnalyzerSchema

Type: object
No Additional Properties

Paragraphs

Type: array

List of detected paragraphs

No Additional Items

Each item of this array must be:

ParagraphSchema

Type: object
No Additional Properties

Box

Type: array of integer

Bounding box of the paragraph in the format [x1, y1, x2, y2]

Must contain a minimum of 4 items

Must contain a maximum of 4 items

No Additional Items

Each item of this array must be:

Contents


Text content of the paragraph

Direction


Text direction, e.g., ['horizontal' or 'vertical']

Order


Order of the paragraph in the document

Role


Role of the paragraph, e.g., ['sectionheadings', 'pageheader', 'page_footer'])

Tables

Type: array

List of detected tables

No Additional Items

Each item of this array must be:

TableStructureRecognizerSchema

Type: object
No Additional Properties

Box

Type: array of integer

Bounding box of the table in the format [x1, y1, x2, y2]

Must contain a minimum of 4 items

Must contain a maximum of 4 items

No Additional Items

Each item of this array must be:

N Row

Type: integer

Number of rows in the table

N Col

Type: integer

Number of columns in the table

Rows

Type: array

List of table lines representing rows

No Additional Items

Each item of this array must be:

TableLineSchema

Type: object
No Additional Properties

Box

Type: array of integer

Bounding box of the table line in the format [x1, y1, x2, y2]

Must contain a minimum of 4 items

Must contain a maximum of 4 items

No Additional Items

Each item of this array must be:

Score

Type: number

Confidence score of the table line detection

Cols

Type: array

List of table lines representing columns

No Additional Items

Each item of this array must be:

Spans

Type: array

List of table lines representing spans

No Additional Items

Each item of this array must be:

Cells

Type: array

List of table cells

No Additional Items

Each item of this array must be:

TableCellSchema

Type: object
No Additional Properties

Col

Type: integer

Column index of the cell

Row

Type: integer

Row index of the cell

Col Span

Type: integer

Number of columns spanned by the cell

Row Span

Type: integer

Number of rows spanned by the cell

Box

Type: array of integer

Bounding box of the cell in the format [x1, y1, x2, y2]

Must contain a minimum of 4 items

Must contain a maximum of 4 items

No Additional Items

Each item of this array must be:

Order

Type: integer

Order of the table in the document

Words

Type: array

List of recognized words

No Additional Items

Each item of this array must be:

WordPrediction

Type: object
No Additional Properties

Points

Type: array of array

Bounding box of the word in the format [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]

Must contain a minimum of 4 items

Must contain a maximum of 4 items

No Additional Items

Each item of this array must be:

Type: array of integer

Must contain a minimum of 2 items

Must contain a maximum of 2 items

No Additional Items

Each item of this array must be:

Content

Type: string

Text content of the word

Direction

Type: string

Text direction, e.g., 'horizontal' or 'vertical'

Rec Score

Type: number

Confidence score of the word recognition

Det Score

Type: number

Confidence score of the word detection

Figures

Type: array

List of detected figures

No Additional Items

Each item of this array must be:

FigureSchema

Type: object
No Additional Properties

Box

Type: array of integer

Bounding box of the figure in the format [x1, y1, x2, y2]

Must contain a minimum of 4 items

Must contain a maximum of 4 items

No Additional Items

Each item of this array must be:

Type: integer

Order


Order of the figure in the document

Paragraphs

Type: array

List of paragraphs associated with the figure

No Additional Items

Each item of this array must be:

Direction


Text direction, e.g., ['horizontal' or 'vertical']