Document Content Types & Ranking

Modified on Mon, 1 Jun at 2:04 PM

Need help with anything in this article or have other questions? Contact us at support@noticiasolutions.com

This guide explains how different types of document content—native files, extracted text, formatted content, OCR - and file‑type ranking, work together in the system. It describes how text is created, how it becomes searchable, and how the system decides which version of a document to display. The goal is to help you understand why certain content appears in the viewer, how hit highlighting works, and how to manage file ranking when a document has more than one version (native, image/pdf, and/or text file).

Extracted Text, Formatted Content View and OCR – Simple Overview

Extracted Text:

Creates a separate text file taken from the document content
Used for exporting text files, or processes that require a text file (AI, CAL etc)
It does not affect search results or hit highlighting

Formatted/Unformatted Content View:

Formatted content, unformatted content, and native view modes always show the same document
Does not rely on text files (including extracted text); it reads the content directly from the document displayed in the native online viewer
Displays text characters only (letters, numbers, symbols)
Does not display images within the document

OCRing:

Only applies to images or PDFs.
When you run OCR you have two choices:

1. Embedded OCR (PDF only)

Reads the text and embeds the text into the PDF.
Does not generate a separate text file.
To generate a text file from embedded OCR’d content, run Extract Text.

2. OCR to a Separate Text File

Creates a standalone text file.
Does not embed the text in the document.
To show this OCR text in formatted content view, the text file must rank higher than the image in file type ranking.

Important:

If you choose to OCR without embedding the text it will replace any existing text file that was loaded, or created by extracting text.

Summary

These three components work together but serve different purposes:

Extracted Text → creates a text file for processing.
Formatted/Unformatted Content → shows the document’s text‑only content in the viewer.
OCR → adds text to documents that don’t have any.

As long as a document has any source of text—embedded, extracted, OCR‑generated, or native—the document is searchable and will appear in search results if any text source matches the criteria.

General Summary of what Content File Type Rank Is:

A single document in the system can have multiple file versions associated to it (for example: a word file, a text file and an image file). The Content File Type Rank tells the system which of those files to display in:

the native online viewer, and
the formatted/unformatted content view

Important:

You only get hit highlighting if the term appears in the file that is being displayed in the

native/formatted/unformatted viewer

Content File Ranking vs Content Searching

These concepts are separate:

Content File Rank

Decides which file is shown in the native/formatted/unformatted viewer
Controls where hit highlighting can appear
Is the file used for analytics jobs

Content Searching

Uses the index, which contains all the text from all attached files
Search results return the document if the term appears in any of the files.

File Types that are not listed on the Case Options page.

This is only a concern if you are importing more than 1 file type for that record:

Example:

You import a csv file with a text file.

The text file ranks higher by default
The text file appears in the native viewer.
The image viewer will still show the csv
Both files are searchable, but only the file showing in the native view will have its content in the formatted/unformatted text view and therefore show that contents hit highlights.
If you would prefer to see the csv in the native viewer, add the missing file extension (.csv) to the Content File Rank list, and rank it higher than the text file.