Open Discover® SDK for .NET
Open Discover SDK is a .NET application programming interface (API) that allows for:
- Identifying file formats using internal binary signatures for reliable and fast file format identification (versus using unreliable file extensions)
- Extracting text from supported file formats and optionally identifying languages present in the extracted text (DOC, XLS, PPT, DOCX, XLSX, PPTX, ONENOTE, MSG, EML, EMLX, DXL, and many more)
- Extracting metadata from supported file formats (over 1,325 known metadata fields in total)
- Extracting embedded items/attachments from supported document formats
- Extracting archive container items (7ZIP, ZIP, RAR, TAR, etc)
- Extracting mail store container email objects (PST, OST, OST2013, MBOX, etc
Open Discover SDK API is purposed for users to develop higher level document processing applications for:
- Full text search using Lucene.NET
- Machine learning using extracted text and metadata
- Text analytics and document concept clustering
- Information governance
- Website crawling/full-text website search
- Enterprise search and content management
- IT Departments - identify, metadata scan, and de-duplicate documents on file servers
- eDiscovery applications
- And more...