Introduction to Lucene.Net


Desktop-as-a-Service Designed for Any Cloud ? Nutanix Frame

What is Lucene.Net?

Lucene.Net is an exact port of the original Lucene search engine library, written in C#. It provides a framework (APIs) for creating applications with full text search.

Lucene.Net can be downloaded from http://incubator.apache.org/lucene.net/download.html. Currently it is undergoing incubation at Apache Software Foundation (ASF).

Why Use Lucene.Net?

You can use Lucene.Net to add more power to an already existing search in your ASP.Net web application or website. It can also be used to index and search documents (word, pdf, etc.) within your application.

This article describes how we can use Lucene.Net to add full text search in our ASP.Net applications. Any search function consists of two basic steps, first to index the text and second to search the text. We will use Lucene.Net to do both of the steps.

In this example we will try to read the content of a text file and index it using Lucene.Net. First download the dll and add a reference to the project.

How to Use Lucene.Net

Indexing the text

There are a few things to understand before we start indexing.

1. Analyzer - To read the text and break them into words (Tokens). Can also be used to remove 'noise words' (common words which you would not want to index).

2. Fields - Content holders with a name and a value.

3. Documents - The unit of indexing and search. Is a collection of fields. Documents are added to the index and are returned as a list of results.

4. Index - is a collection of documents.

5. IndexWriter - Writes the document to the index file.

Code for creating the index file

string strIndexDir = @"D:\Index";
Lucene.Net.Store.Directory indexDir = Lucene.Net.Store.FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
Analyzer std = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29); //Version parameter is used for backward compatibility. Stop words can also be passed to avoid indexing certain words
IndexWriter idxw = new IndexWriter(indexDir, std, true, IndexWriter.MaxFieldLength.UNLIMITED); //Create an Index writer object.
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
Lucene.Net.Documents.Field fldText = new Lucene.Net.Documents.Field("text", System.IO.File.ReadAllText(@"d:\test.txt"), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES);
//write the document to the index
//optimize and close the writer
Response.Write("Indexing Done");

Parameters passed while adding Field are:

1. Lucene.Net.Documents.Field.Store. YES - Field is stored in the index and would be returned in search results. Passing NO would not store the field in the index and would not be shown in the results.

2. Lucene.Net.Documents.Field.Index. ANALYZED - Field can be searched. NO means it will not be searchable. NOT_ANALYZED means field would be searched but analyzer is not used.

3. Lucene.Net.Documents.Field.TermVector. YES - Stores list of terms and number of occurrences (Google to understand TermVector more).

It is recommended to call the IndexWriter.Optimize() on completion of the indexing. It "optimizes" the index for the fastest possible search.

First part of indexing the text is completed. We will now search the index for the text entered in the textbox.

Search the text

We will build the search query using the QueryParser class. There are more Query classes available in Lucene.Net, such as TermQuery, RangeQuery, etc., which can be used for different requirements. To create a search query we need use the Analyzer object and the field in the index to search in.

string strIndexDir = @"D:\Index";
Analyzer std = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", std);
Lucene.Net.Search.Query qry = parser.Parse(Search.Text);

After creating the query object we will use the IndexReader object for opening the index in read only mode.

Lucene.Net.Store.Directory directory = Lucene.Net.Store.FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir)); //Provide the directory where index is stored
Lucene.Net.Search.Searcher srchr = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Index.IndexReader.Open(directory, true));//true opens the index in read only mode

Lucene.Net stores the search results (documents) in Collectors. There are different Collectors available in Lucene.Net. In this example we will use "TopScoreDocCollector," which sorts the results based on athe number of occurrences in each document. Create method of "TopScoreDocCollector" accepts two parameters - maximum number of documents required (int) and whether to sort the docs by score.

TopScoreDocCollector cllctr = TopScoreDocCollector.create(100, true);

Once the collector object is ready we will perform the search and get the results from the collector in a ScoreDoc array.

ScoreDoc[] hits = cllctr.TopDocs().scoreDocs; 
for (int i = 0; i < hits.Length; i++)
int docId = hits[i].doc;
float score = hits[i].score;
Lucene.Net.Documents.Document doc = srchr.Doc(docId);
Response.Write("Searched from Text: " + doc.Get("text"));

This is just an introduction to Lucene.Net. There are a lot of other areas to be explored, such as different Analyzers, QueryParsers, Collectors, etc.

Happy learning.

About the Author

Rohit Kukreti

He has been in Software field for last 11+ years. Always been programming on Microsoft technologies. He started with VB, ASP and then gradually moved on to ASP.Net. C# is the preferred language for development. When not working on some piece of code he likes to play table tennis and football. He is a huge fan of English Football club Manchester United.


  • Thank You

    Posted by Pouyan on 03/10/2017 11:38am

    Very Good. Thanks a lot.

  • reading content while searching

    Posted by Amit on 12/04/2015 12:21pm

    It is always opening and closing storage document file in via c# code, to executing query, because simultaneous 200 request occurs then sure there are delay in reading search query. Or of this is stored on iss worker process as application variable then what happen if data is very large.

  • appreciation

    Posted by zahra on 02/27/2015 08:40am

    hello. your article really excellent for me. thank you. with best wishes :)

  • How to read pdf, word,ppt files

    Posted by Pranay on 07/28/2014 07:51am

    Hi, I like d your article its short and good to understand. Could you please add 'How to search content inside the pdf, word,ppt files and display result?

    • Search pdf, word and ppt files

      Posted by Rohit on 09/22/2014 08:26pm

      Hi Pranay, For searching any files, it makes sense to read text inside the files and then index in Lucene or any indexing store and then later on you can search on the field. I have done it in my projects. For reading word, pdf and ppt file I found out that aspose is the best component. Its a paid component, you can find it on www.aspose.com. Alternatively try using iFilter, its a Microsoft component. Regards, Rohit

      • Requet

        Posted by Sumit R.B. on 07/26/2017 05:08am

        Can you send me your project that finds the pdf,doc or any files

  • Nice post

    Posted by leromeo on 02/06/2013 04:10am

    Hello, you miss executing the query srchr.Search(qry , cllctr);

  • lucene.net

    Posted by Ajay on 12/08/2012 02:36pm

    how can i download lucene.net and can i use visual studios 2008 for development using lucene.

  • Missing some code

    Posted by digit.rv on 08/04/2012 07:41pm

    Before creating "hits" object, we must to run the searcher Add srchr.search(qry, cllctr); before ScoreDoc[] hits = cllctr.TopDocs().scoreDocs;

  • Sample Code

    Posted by Joe on 07/31/2012 06:12am

    Hi, i found this tutorial very useful. Just like Ashu, i am getting error in the code. Would you mind to share the sample code for us to download. Thanks

  • Sample Code

    Posted by Ashu on 06/07/2012 08:09am

    Hi, Can you upload any sample code. I tried with the step you provided but got error in TopScoreDocCollector cllctr = TopScoreDocCollector.create(100, true); Error : 'Lucene.Net.Search.TopScoreDocCollector' does not contain a definition for 'create' Thanks, Ashu

  • You must have javascript enabled in order to post comments.

Leave a Comment
  • Your email address will not be published. All fields are required.

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date