Need help with anything in this article or have other questions? Contact us at support@noticiasolutions.com
This guide explains how different types of document content—native files, extracted text, formatted content, OCR - and file‑type ranking, work together in the system. It describes how text is created, how it becomes searchable, and how the system decides which version of a document to display. The goal is to help you understand why certain content appears in the viewer, how hit highlighting works, and how to manage file ranking when a document has more than one version (native, image/pdf, and/or text file).
Extracted Text, Formatted Content View and OCR – Simple Overview
Extracted Text:
- Creates a separate text file taken from the document content
- Used for exporting text files, or processes that require a text file (AI, CAL etc)
- It does not affect search results or hit highlighting
Formatted/Unformatted Content View:
- Formatted content, unformatted content, and native view modes always show the same document
- Does not rely on text files (including extracted text); it reads the content directly from the document displayed in the native online viewer
- Displays text characters only (letters, numbers, symbols)
- Does not display images within the document
OCRing:
- Only applies to images or PDFs.
- When you run OCR you have two choices:
1. Embedded OCR (PDF only)
- Reads the text and embeds the text into the PDF.
- Does not generate a separate text file.
- To generate a text file from embedded OCR’d content, run Extract Text.
2. OCR to a Separate Text File
- Creates a standalone text file.
- Does not embed the text in the document.
- To show this OCR text in formatted content view, the text file must rank higher than the image in file type ranking.
Important:
If you choose to OCR without embedding the text it will replace any existing text file that was loaded, or created by extracting text.
Summary
These three components work together but serve different purposes:
- Extracted Text → creates a text file for processing.
- Formatted/Unformatted Content → shows the document’s text‑only content in the viewer.
- OCR → adds text to documents that don’t have any.
As long as a document has any source of text—embedded, extracted, OCR‑generated, or native—the document is searchable and will appear in search results if any text source matches the criteria.
General Summary of what Content File Type Rank Is:
A single document in the system can have multiple file versions associated to it (for example: a word file, a text file and an image file). The Content File Type Rank tells the system which of those files to display in:
- the native online viewer, and
- the formatted/unformatted content view
Important:
You only get hit highlighting if the term appears in the file that is being displayed in the
native/formatted/unformatted viewer
Content File Ranking vs Content Searching
These concepts are separate:
Content File Rank
- Decides which file is shown in the native/formatted/unformatted viewer
- Controls where hit highlighting can appear
- Is the file used for analytics jobs
Content Searching
- Uses the index, which contains all the text from all attached files
- Search results return the document if the term appears in any of the files.
File Types that are not listed on the Case Options page.
This is only a concern if you are importing more than 1 file type for that record:
Example:
You import a csv file with a text file.
- The text file ranks higher by default
- The text file appears in the native viewer.
- The image viewer will still show the csv
- Both files are searchable, but only the file showing in the native view will have its content in the formatted/unformatted text view and therefore show that contents hit highlights.
- If you would prefer to see the csv in the native viewer, add the missing file extension (.csv) to the Content File Rank list, and rank it higher than the text file.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article