Adding meta data (CustomData) to indexed BLOB documents

If your project has binary documents (PDF, DOCX etc) that are stored in database BLOB objects you can learn how to index them by following this link. When the database also contains meta information about those documents, we can add that data to the index so that it can be retrieved (visually or programmatically) at search time.

Note that with HTML you can easily add meta data via meta tags. With binary documents that isn't possible, so meta data (which we call CustomData) needs to be added via a plug-in DLL which also gives you the ability to write code to hit the DB and process the data as needed before adding it to the index.

Create a plug-in project

The Index Management Tool can quickly generate a boiler-plate project to get you started.

  1. Select the Plug-in tab from the Index Management Tool.
  2. Under "Create Visual Studio Plug-in Project" enter the folder path where the project should be created, and choose the desired language.
  3. Click "Create Project" and choose "Yes" to open it now.
Next we will modify the dispatcher_Action method, to handle the ReadText event (which is fired when a document is being indexed), then grab the meta data and add it.

Modify the plug-in class to read/write meta data

  1. Open Class1 in the Visual Studio project.
  2. In the dispatcher_Action method there is a block of code with example cases in it, delete that entire method.
  3. Add the following code
    private void dispatcher_Action(object sender, Keyoti.SearchEngine.Events.ActionEventArgs e)
    {
    
        if(e.ActionData.Name == ActionName.ReadText)
        {
            string documentUri = (sender as Document).Uri.AbsoluteUri;
            someData = HitDBAndGetMetaDataFor(documentUri);
            (e.ActionData.Data as DocumentText).MetaCustomData = someData;
        } 
    
    }
    
  4. This code calls the method HitDBAndGetMetaDataFor, which doesn't exist yet, so create it.
    private string HitDBAndGetMetaDataFor(string documentUri)
    {
    ...
    }
    
  5. Complete the method with whatever code you need to hit your database, and retrieve the meta data for this document.
  6. It is helpful set the meta data using URL GET parameter formatting <parameter name>=<parameter value>, for example: author=John%20Smith&date=2009-04-29. This format allows the engine to parse the data into a map at search time.

Use the meta data at search time

There's a lot you can do with the meta CustomData, such as; filter results with it, or use it in result summaries.