Document Content Types & Ranking

Modified on Mon, 1 Jun at 2:04 PM

Need help with anything in this article or have other questions? Contact us at support@noticiasolutions.com


This guide explains how different types of document content—native files, extracted text, formatted content, OCR - and file‑type ranking, work together in the system. It describes how text is created, how it becomes searchable, and how the system decides which version of a document to display. The goal is to help you understand why certain content appears in the viewer, how hit highlighting works, and how to manage file ranking when a document has more than one version (native, image/pdf, and/or text file).


Extracted Text, Formatted Content View and OCR – Simple Overview

    Extracted Text: 

  • Creates a separate text file taken from the document content
  • Used for exporting text files, or processes that require a text file (AI, CAL etc)
  • It does not affect search results or hit highlighting

 

    Formatted/Unformatted Content View: 

  • Formatted content, unformatted content, and native view modes always show the same document
  • Does not rely on text files (including extracted text); it reads the content directly from the document displayed in the native online viewer
  • Displays text characters only (letters, numbers, symbols)
  • Does not display images within the document

 

    OCRing: 

  • Only applies to images or PDFs.  
  • When you run OCR you have two choices:

1. Embedded OCR (PDF only)

  • Reads the text and embeds the text into the PDF.
  • Does not generate a separate text file.
  • To generate a text file from embedded OCR’d content, run Extract Text.

2. OCR to a Separate Text File

  • Creates a standalone text file.
  • Does not embed the text in the document.
  • To show this OCR text in formatted content view, the text file must rank higher than the image in file type ranking.

    Important:

    If you choose to OCR without embedding the text it will replace any existing text file that was loaded, or     created by extracting text.

 

     Summary

    These three components work together but serve different purposes:

  • Extracted Text → creates a text file for processing.
  • Formatted/Unformatted Content → shows the document’s text‑only content in the viewer.
  • OCR → adds text to documents that don’t have any.

    As long as a document has any source of text—embedded, extracted, OCR‑generated, or native—the        document is searchable and will appear in search results if any text source matches the criteria.

 

 

General Summary of what Content File Type Rank Is:


     A single document in the system can have multiple file versions associated to it (for example: a word file,     a text file and an image file).  The Content File Type Rank tells the system which of those files to display     in:

  • the native online viewer, and
  • the formatted/unformatted content view


    Important: 

    You only get hit highlighting if the term appears in the file that is being displayed in the         

    native/formatted/unformatted viewer


Content File Ranking vs Content Searching

    These concepts are separate:

 

Content File Rank

  • Decides which file is shown in the native/formatted/unformatted viewer
  • Controls where hit highlighting can appear
  • Is the file used for analytics jobs

Content Searching

  • Uses the index, which contains all the text from all attached files
  • Search results return the document if the term appears in any of the files.   

 

File Types that are not listed on the Case Options page.  


     This is only a concern if you are importing more than 1 file type for that record:

    Example: 

    You import a csv file with a text file.  

  • The text file ranks higher by default
  • The text file appears in the native viewer.
  • The image viewer will still show the csv
  • Both files are searchable, but only the file showing in the native view will have its content in the formatted/unformatted text view and therefore show that contents hit highlights.
  • If you would prefer to see the csv in the native viewer, add the missing file extension (.csv) to the Content File Rank list, and rank it higher than the text file.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article