org.apache.lucene.index

Class IndexReader

public abstract class IndexReader extends Object

IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.

Concrete subclasses of IndexReader are usually constructed with a call to one of the static open() methods, e.g. {@link #open(String)}.

For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral--they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.

An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to delete documents from the index then.

Version: $Id: IndexReader.java 358685 2005-12-23 02:38:23Z yonik $

Author: Doug Cutting

Nested Class Summary
static classIndexReader.FieldOption
Constructor Summary
protected IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory.
Method Summary
voidclose()
Closes files associated with this index.
protected voidcommit()
Commit changes resulting from delete, undeleteAll, or setNorm operations
voiddelete(int docNum)
Deletes the document numbered docNum.
intdelete(Term term)
Deletes all documents containing term.
voiddeleteDocument(int docNum)
Deletes the document numbered docNum.
intdeleteDocuments(Term term)
Deletes all documents containing term.
Directorydirectory()
Returns the directory this index resides in.
abstract intdocFreq(Term t)
Returns the number of documents containing the term t.
abstract Documentdocument(int n)
Returns the stored fields of the nth Document in this index.
protected abstract voiddoClose()
Implements close.
protected abstract voiddoCommit()
Implements commit.
protected abstract voiddoDelete(int docNum)
Implements deletion of the document numbered docNum.
protected abstract voiddoSetNorm(int doc, String field, byte value)
Implements setNorm in subclass.
protected abstract voiddoUndeleteAll()
Implements actual undeleteAll() in subclass.
protected voidfinalize()
Release the write lock, if needed.
static longgetCurrentVersion(String directory)
Reads version number from segments files.
static longgetCurrentVersion(File directory)
Reads version number from segments files.
static longgetCurrentVersion(Directory directory)
Reads version number from segments files.
abstract CollectiongetFieldNames()
Returns a list of all unique field names that exist in the index pointed to by this IndexReader.
abstract CollectiongetFieldNames(boolean indexed)
Returns a list of all unique field names that exist in the index pointed to by this IndexReader.
abstract CollectiongetFieldNames(IndexReader.FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified field option information.
CollectiongetIndexedFieldNames(boolean storedTermVector)
abstract CollectiongetIndexedFieldNames(Field.TermVector tvSpec)
Get a list of unique field names that exist in this index, are indexed, and have the specified term vector information.
abstract TermFreqVectorgetTermFreqVector(int docNumber, String field)
Return a term frequency vector for the specified document and field.
abstract TermFreqVector[]getTermFreqVectors(int docNumber)
Return an array of term frequency vectors for the specified document.
longgetVersion()
Version number when this IndexReader was opened.
abstract booleanhasDeletions()
Returns true if any documents have been deleted
booleanhasNorms(String field)
Returns true if there are norms stored for this field.
static booleanindexExists(String directory)
Returns true if an index exists at the specified directory.
static booleanindexExists(File directory)
Returns true if an index exists at the specified directory.
static booleanindexExists(Directory directory)
Returns true if an index exists at the specified directory.
booleanisCurrent()
Check whether this IndexReader still works on a current version of the index.
abstract booleanisDeleted(int n)
Returns true if document n has been deleted
static booleanisLocked(Directory directory)
Returns true iff the index in the named directory is currently locked.
static booleanisLocked(String directory)
Returns true iff the index in the named directory is currently locked.
static longlastModified(String directory)
Returns the time the index in the named directory was last modified.
static longlastModified(File directory)
Returns the time the index in the named directory was last modified.
static longlastModified(Directory directory)
Returns the time the index in the named directory was last modified.
static voidmain(String[] args)
Prints the filename and size of each file within a given compound file.
abstract intmaxDoc()
Returns one greater than the largest possible document number.
abstract byte[]norms(String field)
Returns the byte-encoded normalization factor for the named field of every document.
abstract voidnorms(String field, byte[] bytes, int offset)
Reads the byte-encoded normalization factor for the named field of every document.
abstract intnumDocs()
Returns the number of documents in this index.
static IndexReaderopen(String path)
Returns an IndexReader reading the index in an FSDirectory in the named path.
static IndexReaderopen(File path)
Returns an IndexReader reading the index in an FSDirectory in the named path.
static IndexReaderopen(Directory directory)
Returns an IndexReader reading the index in the given Directory.
voidsetNorm(int doc, String field, byte value)
Expert: Resets the normalization factor for the named field of the named document.
voidsetNorm(int doc, String field, float value)
Expert: Resets the normalization factor for the named field of the named document.
TermDocstermDocs(Term term)
Returns an enumeration of all the documents which contain term.
abstract TermDocstermDocs()
Returns an unpositioned {@link TermDocs} enumerator.
TermPositionstermPositions(Term term)
Returns an enumeration of all the documents which contain term.
abstract TermPositionstermPositions()
Returns an unpositioned {@link TermPositions} enumerator.
abstract TermEnumterms()
Returns an enumeration of all the terms in the index.
abstract TermEnumterms(Term t)
Returns an enumeration of all terms after a given term.
voidundeleteAll()
Undeletes all documents currently marked as deleted in this index.
static voidunlock(Directory directory)
Forcibly unlocks the index in the named directory.

Constructor Detail

IndexReader

protected IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory. This is used for IndexReaders that are used within other IndexReaders that take care or locking directories.

Parameters: directory Directory where IndexReader files reside.

Method Detail

close

public final void close()
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.

commit

protected final void commit()
Commit changes resulting from delete, undeleteAll, or setNorm operations

Throws: IOException

delete

public final void delete(int docNum)

Deprecated: Use {@link #deleteDocument(int docNum)} instead.

Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the {@link #document} method will result in an error. The presence of this document may still be reflected in the {@link #docFreq} statistic, though this will be corrected eventually as the index is further modified.

delete

public final int delete(Term term)

Deprecated: Use {@link #deleteDocuments(Term term)} instead.

Deletes all documents containing term. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See {@link #delete(int)} for information about when this deletion will become effective.

Returns: the number of documents deleted

deleteDocument

public final void deleteDocument(int docNum)
Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the {@link #document} method will result in an error. The presence of this document may still be reflected in the {@link #docFreq} statistic, though this will be corrected eventually as the index is further modified.

deleteDocuments

public final int deleteDocuments(Term term)
Deletes all documents containing term. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See {@link #delete(int)} for information about when this deletion will become effective.

Returns: the number of documents deleted

directory

public Directory directory()
Returns the directory this index resides in.

docFreq

public abstract int docFreq(Term t)
Returns the number of documents containing the term t.

document

public abstract Document document(int n)
Returns the stored fields of the nth Document in this index.

doClose

protected abstract void doClose()
Implements close.

doCommit

protected abstract void doCommit()
Implements commit.

doDelete

protected abstract void doDelete(int docNum)
Implements deletion of the document numbered docNum. Applications should call {@link #delete(int)} or {@link #delete(Term)}.

doSetNorm

protected abstract void doSetNorm(int doc, String field, byte value)
Implements setNorm in subclass.

doUndeleteAll

protected abstract void doUndeleteAll()
Implements actual undeleteAll() in subclass.

finalize

protected void finalize()
Release the write lock, if needed.

getCurrentVersion

public static long getCurrentVersion(String directory)
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

Parameters: directory where the index resides.

Returns: version number.

Throws: IOException if segments file cannot be read

getCurrentVersion

public static long getCurrentVersion(File directory)
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

Parameters: directory where the index resides.

Returns: version number.

Throws: IOException if segments file cannot be read

getCurrentVersion

public static long getCurrentVersion(Directory directory)
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

Parameters: directory where the index resides.

Returns: version number.

Throws: IOException if segments file cannot be read.

getFieldNames

public abstract Collection getFieldNames()

Deprecated: Replaced by {@link #getFieldNames(IndexReader.FieldOption)}

Returns a list of all unique field names that exist in the index pointed to by this IndexReader.

Returns: Collection of Strings indicating the names of the fields

Throws: IOException if there is a problem with accessing the index

getFieldNames

public abstract Collection getFieldNames(boolean indexed)

Deprecated: Replaced by {@link #getFieldNames(IndexReader.FieldOption)}

Returns a list of all unique field names that exist in the index pointed to by this IndexReader. The boolean argument specifies whether the fields returned are indexed or not.

Parameters: indexed true if only indexed fields should be returned; false if only unindexed fields should be returned.

Returns: Collection of Strings indicating the names of the fields

Throws: IOException if there is a problem with accessing the index

getFieldNames

public abstract Collection getFieldNames(IndexReader.FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified field option information.

Parameters: fldOption specifies which field option should be available for the returned fields

Returns: Collection of Strings indicating the names of the fields.

See Also: FieldOption

getIndexedFieldNames

public Collection getIndexedFieldNames(boolean storedTermVector)

Deprecated: Replaced by {@link #getFieldNames(IndexReader.FieldOption)}

Parameters: storedTermVector if true, returns only Indexed fields that have term vector info, else only indexed fields without term vector info

Returns: Collection of Strings indicating the names of the fields

getIndexedFieldNames

public abstract Collection getIndexedFieldNames(Field.TermVector tvSpec)

Deprecated: Replaced by {@link #getFieldNames(IndexReader.FieldOption)}

Get a list of unique field names that exist in this index, are indexed, and have the specified term vector information.

Parameters: tvSpec specifies which term vector information should be available for the fields

Returns: Collection of Strings indicating the names of the fields

getTermFreqVector

public abstract TermFreqVector getTermFreqVector(int docNumber, String field)
Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionsVector is returned.

Parameters: docNumber document for which the term frequency vector is returned field field for which the term frequency vector is returned.

Returns: term frequency vector May be null if field does not exist in the specified document or term vector was not stored.

Throws: IOException if index cannot be accessed

See Also: TermVector

getTermFreqVectors

public abstract TermFreqVector[] getTermFreqVectors(int docNumber)
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned my either be of type TermFreqVector or of type TermPositionsVector if positions or offsets have been stored.

Parameters: docNumber document for which term frequency vectors are returned

Returns: array of term frequency vectors. May be null if no term vectors have been stored for the specified document.

Throws: IOException if index cannot be accessed

See Also: TermVector

getVersion

public long getVersion()
Version number when this IndexReader was opened.

hasDeletions

public abstract boolean hasDeletions()
Returns true if any documents have been deleted

hasNorms

public boolean hasNorms(String field)
Returns true if there are norms stored for this field.

indexExists

public static boolean indexExists(String directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. false is returned.

Parameters: directory the directory to check for an index

Returns: true if an index exists; false otherwise

indexExists

public static boolean indexExists(File directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

Parameters: directory the directory to check for an index

Returns: true if an index exists; false otherwise

indexExists

public static boolean indexExists(Directory directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

Parameters: directory the directory to check for an index

Returns: true if an index exists; false otherwise

Throws: IOException if there is a problem with accessing the index

isCurrent

public boolean isCurrent()
Check whether this IndexReader still works on a current version of the index. If this is not the case you will need to re-open the IndexReader to make sure you see the latest changes made to the index.

Throws: IOException

isDeleted

public abstract boolean isDeleted(int n)
Returns true if document n has been deleted

isLocked

public static boolean isLocked(Directory directory)
Returns true iff the index in the named directory is currently locked.

Parameters: directory the directory to check for a lock

Throws: IOException if there is a problem with accessing the index

isLocked

public static boolean isLocked(String directory)
Returns true iff the index in the named directory is currently locked.

Parameters: directory the directory to check for a lock

Throws: IOException if there is a problem with accessing the index

lastModified

public static long lastModified(String directory)
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #isCurrent()} instead.

lastModified

public static long lastModified(File directory)
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #isCurrent()} instead.

lastModified

public static long lastModified(Directory directory)
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #isCurrent()} instead.

main

public static void main(String[] args)
Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored.

Parameters: args Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile>

maxDoc

public abstract int maxDoc()
Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.

norms

public abstract byte[] norms(String field)
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

See Also: Field

norms

public abstract void norms(String field, byte[] bytes, int offset)
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

See Also: Field

numDocs

public abstract int numDocs()
Returns the number of documents in this index.

open

public static IndexReader open(String path)
Returns an IndexReader reading the index in an FSDirectory in the named path.

open

public static IndexReader open(File path)
Returns an IndexReader reading the index in an FSDirectory in the named path.

open

public static IndexReader open(Directory directory)
Returns an IndexReader reading the index in the given Directory.

setNorm

public final void setNorm(int doc, String field, byte value)
Expert: Resets the normalization factor for the named field of the named document. The norm represents the product of the field's {@link Field#setBoost(float) boost} and its {@link Similarity#lengthNorm(String, int) length normalization}. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.

See Also: norms Similarity

setNorm

public void setNorm(int doc, String field, float value)
Expert: Resets the normalization factor for the named field of the named document.

See Also: norms Similarity

termDocs

public TermDocs termDocs(Term term)
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. Thus, this method implements the mapping:

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.

termDocs

public abstract TermDocs termDocs()
Returns an unpositioned {@link TermDocs} enumerator.

termPositions

public TermPositions termPositions(Term term)
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method implements the mapping:

This positional information faciliates phrase and proximity searching.

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.

termPositions

public abstract TermPositions termPositions()
Returns an unpositioned {@link TermPositions} enumerator.

terms

public abstract TermEnum terms()
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

terms

public abstract TermEnum terms(Term t)
Returns an enumeration of all terms after a given term. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

undeleteAll

public final void undeleteAll()
Undeletes all documents currently marked as deleted in this index.

unlock

public static void unlock(Directory directory)
Forcibly unlocks the index in the named directory.

Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this index.

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.