PDFlib TET

PDFlib TET

PDFlib TET (Text and Image Extraction Toolkit) is a toolkit for developers to consistently extract image, text, and metadata from PDF documents. TET strips the text contents of a PDF as Unicode strings, detailed colour, glyph and font information, and the page's position. Raster images are removed in common image formats. 

PDFlib TET PDFlib TET 5 IBM AIX License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£1,515.00
PDFlib TET PDFlib TET 5 IBM AIX + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£1,815.00
PDFlib TET PDFlib TET 5 IBM i5/iSeries License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£1,515.00
PDFlib TET PDFlib TET 5 IBM i5/iSeries + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£1,815.00
PDFlib TET PDFlib TET 5 Linux License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£755.00
PDFlib TET PDFlib TET 5 Linux + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£910.00
PDFlib TET PDFlib TET 5 OS X Desktop License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£285.00
PDFlib TET PDFlib TET 5 OS X Desktop + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£345.00
PDFlib TET PDFlib TET 5 Windows Desktop License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£285.00
PDFlib TET PDFlib TET 5 Windows Desktop + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£345.00
PDFlib TET PDFlib TET 5 Windows Server License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£755.00
PDFlib TET PDFlib TET 5 Windows Server + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£910.00
PDFlib TET PDFlib TET 5 Oracle Solaris License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£1,515.00
PDFlib TET PDFlib TET 5 Oracle Solaris + Support License type: License type: 1 - 4 licences. Email [email protected] for up to 60% volume discount
£1,815.00

Overview

PDFlib TET (Text and Image Extraction Toolkit) is a toolkit for developers to consistently extract image, text, and metadata from PDF documents. TET strips the text contents of a PDF as Unicode strings, detailed colour, glyph and font information, and the page's position. Raster images are removed in common image formats. 

TET can convert PDF documents to XML-based format known as TETML, which contains text and metadata and resource information. TET includes sophisticated content analysis algorithms for verifying word boundaries, grouping text into columns, detecting table structures and deleting unnecessary items, for example, shadow text.

The Text and Image Extraction Toolkit includes the pCOS interface for querying PDF document details such as XMP metadata, font lists, page size and document information fields.

Features

Convert PDF document contents to other formats
Process PDF documents based on their contents, e.g. splitting based on headings (requires PDFlib+PDI in addition to TET)
Implement PDF indexer for a search engine
Re-purpose Text and Images in PDFs
Check if location on the page is empty, e.g. for placing a barcode or stamp