SearchUnit can import documents from a variety of sources, such as websites & databases for example. In this tutorial we're going to demonstrate how to index documents over the local File System as well as documents located across the network.

Watch the video, or read below for full details.



The ‘File-System Document Store’ import will index documents from a local file path by recursively reading the files under a folder. This is useful when a set of documents/files exist under a web-site but aren't linked to from other documents (i.e. the crawler is unable to find them). Since this process uses the local file system, it is generally much faster than crawling a web-site.

First we will be showing you how to carry out a local File System import and then we’ll run through how to import documents located across your network using UNC file paths.

Indexing local documents for searching

So let’s begin with a basic File System Import.

Open the SearchUnit's Index Management Tool, and then open or create an index. Select ‘Import New Source’, and from the drop down menu, select ‘File System Document Store’.

The local folder path should be set to a path which corresponds to a virtual folder. For our example that will be c:\inetpub\wwwroot\mydocs

and the virtual folder path should be set to the URL that the local path corresponds to. For this example it will be http://localhost/mydocs

So, any documents found under the local folder are automatically mapped to the virtual folder (The local folder path can be relative to the index directory).

Next, we’re going to use the Target Match List option to specify the documents that will be imported. You can add a list of strings that when matched, will be imported to the index. This could be set to .html or .pdf for example. For more detailed usage please see the SearchUnit documentation.

Now we’re ready to go ahead and Import. Once complete you will be advised of any errors encountered during import. If there are errors, you should first check the local and virtual folder paths are setup correctly and if necessary, enable Logging and try importing again. The log files can then be viewed to determine if any changes to the import parameters need to be made or not.

So back in the main Index Manager tool we can now browse the imported documents and can see they are ready to be used as part of the search index for your ASP.NET web application.

Index documents over UNC for searching.

An alternative way of using the FileSystemDocumentStore import is to add documents from machines across your local network, using their UNC file paths.

So let’s begin by clicking on ‘Import New Source’, and from the drop down menu, select ‘File System Document Store’.

The local folder path is going to be the UNC path

to the folder on the network that contains the documents to be indexed. So enter the machine name along with the drive or folder to be indexed first, in our example this will be ‘\\MYTestMachine\Docs\’

The virtual folder path in this example will need to be set as ‘file://MYTestMachine/Docs/

Next, setup the Options for your File System import as explained in the earlier part of this video and then start your import.

Once the import operation is complete, close the form to return back to the main Index Manager tool. The imported documents can be browsed and are now ready to be searched using SearchUnit.