Lucene 1.5-rc1-dev API
Jakarta Lucene is a high-performance, full-featured text search engine library.
Jakarta Lucene is a high-performance, full-featured text search engine library.
The API is divided into several packages:
-
org.apache.lucene.util
contains a few handy data structures, e.g., BitVector
and PriorityQueue.
-
org.apache.lucene.store
defines an abstract class for storing persistent data, the Directory,
a collection of named files written by an OutputStream
and read by an InputStream.
Two implementations are provided, FSDirectory,
which uses a file system directory to store files, and RAMDirectory
which implements files as memory-resident data structures.
-
org.apache.lucene.document
provides a simple Document
class. A document is simply a set of named Field's,
whose values may be strings or instances of java.io.Reader.
-
org.apache.lucene.analysis
defines an abstract Analyzer
API for converting text from a java.io.Reader
into a TokenStream,
an enumeration of Token's.
A TokenStream is composed by applying TokenFilter's
to the output of a Tokenizer.
A few simple implemenations are provided, including StopAnalyzer
and the grammar-based StandardAnalyzer.
-
org.apache.lucene.index
provides two primary classes: IndexWriter,
which creates and adds documents to indices; and IndexReader,
which accesses the data in the index.
-
org.apache.lucene.search
provides data structures to represent queries (TermQuery
for individual words, PhraseQuery
for phrases, and BooleanQuery
for boolean combinations of queries) and the abstract Searcher
which turns queries into Hits.
IndexSearcher
implements search over a single IndexReader.
-
org.apache.lucene.queryParser
uses JavaCC to implement a
QueryParser.
To use Lucene, an application should:
-
Create Document's by
adding
Field's.
-
Create an IndexWriter
and add documents to to it with addDocument();
-
Call QueryParser.parse()
to build a query from a string; and
-
Create an IndexSearcher
and pass the query to its search()
method.
Some simple examples of code which does this are:
To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
0. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder",
"spam-chowder" with the greatest density.]
Query: path:chowder
Searching for: path:chowder
31 total matching documents
0. rec.food.recipes/soups/abalone-chowder
[ ... only thrity-one have "chowder" in the "path"
field. ]
Query: path:"clam chowder"
Searching for: path:"clam chowder"
10 total matching documents
0. rec.food.recipes/soups/clam-chowder
[ ... only ten have "clam chowder" in the "path" field.
]
Query: path:"clam chowder" AND manhattan
Searching for: +path:"clam chowder" +manhattan
2 total matching documents
0. rec.food.recipes/soups/clam-chowder
[ ... only two also have "manhattan" in the contents.
]
[ Note: "+" and "-" are canonical, but "AND", "OR"
and "NOT" may be used. ]
The
IndexHtml demo is more sophisticated.
It incrementally maintains an index of HTML files, adding new files as
they appear, deleting old files as they disappear and re-indexing files
as they change.
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
[ ... create an index containing all the relnotes ]
> rm java/jdk1.1.6/docs/relnotes/smicopyright.html
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html
HTML indexes are searched using SUN's
JavaWebServer
(JWS) and
Search.jhtml. To use
this:
-
copy Search.html and Search.jhtml to JWS's public_html
directory;
-
copy lucene.jar to JWS's lib directory;
-
create and maintain your indexes with demo.IndexHTML in JWS's top-level
directory;
-
launch JWS, with the demo directory on CLASSPATH (only one class
is actually needed);
-
visit Search.html.
Note that indexes can be updated while searches are going on.
Search.jhtml
will re-open the index when it is updated so that the latest version is
immediately available.
Copyright © 2000-2005 Apache Software Foundation. All Rights Reserved.