Index Modular Content in Sitecore with SIRC NuGet Package
January 30, 2018
Sitecore excels at providing Content Authors the ability to fully customize their pages by adding or removing renderings/modules to various placeholders. It is best-practice to structure a Sitecore site in this manner. For example, providing a Rich Text module that points to a datasource. That datasource presumably contains a field that contains the rich text content. An issue with this quickly arises when implementing a search mechanism on the site.
Sitecore does not index the content of this rich text rendering as part of the page item. By default, Sitecore only indexes the individual fields on an item. This is where Sitecore.Index.RenderedContent (referred to as SIRC moving forward) comes in. It provides a simple, customizable framework for adding referenced datasource content directly to the page item in the index.
Get the NuGet package
– Installation instructions included in GitHub link below
Note: Base install adds the computed field to the
<defaultLuceneIndexConfiguration>element, update to point to the required index configuration
Credit for the idea and general structure goes to Nick Wesselman and Kam Figy for their previous work on the subject. SIRC adds some additional enhancements to make the indexing process more customizable and removes any external dependencies for 3rd party libraries.
One of the main subjects it addresses is providing a more surgical approach for determining which types of renderings to index. It wouldn’t be useful to index renderings that appear on all pages, such as a Navigation rendering. Nor would it be relevant to index Call to Actions or Related Articles-type modules that are not pertinent to the page in question. It provides this mechanism by forcing the developer to define the indexable renderings via the config patch in
App_Config\Include\Z.IndexRenderedContent.config in the element
<indexableRenderings>. An example of indexing a Rich Text rendering below:
All primary tasks of the package rely on Sitecore pipelines to return the requested data. This implies everything is overridable. Therefore, if you wish to create a different mechanism for determining whether or not a rendering should be indexed, you can remove the existing processor and patch in your own custom one. Each pipeline contains pipeline arguments specific to the pipeline in question. All pipeline arguments contain a reference to the original item being indexed for full control.
Possible modifications include:
- Only index a rendering if it appears on an item of a given template
- Define indexing settings on the rendering template in Sitecore (accomplishing the same thing as the configs, but move the control into Sitecore)
- Only index content found in the FinalLayout field
- The list can continue onwards based on site requirements
Sitecore Default Pipelines
This pipeline is used to determine what items to update if a dependent item is updated.
In this package, the included processor, on item save, looks for any items that refer to this item from the
FinalLayout fields via a Link Database lookup. If a match is found, this implies a Page Item (defined as an item that has Presentation Details configured) in Sitecore has a rendering with this item set as a datasource. This triggers the index to update the associated Page Item in the index.
It is worth being extra clear that this processor will only find items that are directly linked as a datasource. It will not find items, for example, that may be a child of a datasource item. If you had a “Tabbed” rendering that points to a primary datasource, and then it relied on child items for each “Tab”. Updating the individual tabs would not trigger the primary page item to be reindexed. However, the logic could be updated to facilitate this requirement.
SIRC Custom Pipelines
This pipeline is used to determine which renderings to index. This is to avoid indexing renderings such as Navigation, Related Articles, or Calls to Action. These types of renderings are used on many pages and are not relevant or specific to the page in question.
By default, it reads the
<indexableRenderings> section of
Z.IndexRenderedContent.config and only allows renderings defined here to have their datasources indexed. If you require unique conditions, add a new processor as needed.
This pipeline is responsible for extracting valid renderings and their datasources from the indexed item. The two included processors review the
FinalLayout fields for all valid renderings that contain a datasource.
This pipeline is used to extract the text data from a datasource. By default, it iterates all fields on a datasource item and only indexes fields that are Text Fields. It uses
Sitecore.ContentSearch.IndexOperationsHelper.IsTextField to make this classification. It also automatically excludes all fields that begin with
__ as these are system fields.
In addition, it strips HTML tags from the content for a field such as a rich text field. This is to ensure the index contains clean, ready-to-search content.
This pipeline is executed directly before the extracted content is stored in the index. By default, the only processor here simply formats the content slightly by removing extra whitespace contained in the field.
Stay up to date with our email updates!