Package org.apache.lucene.index.pruning
Static Index Pruning Tools
This package provides a framework for pruning an existing index into a smaller index while retaining visible search quality as much as possible.
An index can be pruned in several levels:
- Remove stored data: see StorePruningPolicy
- Remove terms data: see TermPruningPolicy
-
Class Summary Class Description CarmelTopKTermPruningPolicy Pruning policy with a search quality parameterized guarantee - configuration of this policy allows to specify two parameters: k and ε such that:CarmelTopKTermPruningPolicy.ByDocComparator CarmelUniformTermPruningPolicy Enhanced implementation of Carmel Uniform Pruning,CarmelUniformTermPruningPolicy.ByDocComparator PruningPolicy General Definitions for Index Pruning, such as operations to be performed on field data.PruningTool A command-line tool to configure and run aPruningReader
on an input index and produce a pruned output index usingIndexWriter.addIndexes(IndexReader...)
.RIDFTermPruningPolicy Implementation ofTermPruningPolicy
that uses "residual IDF" metric to determine the postings of terms to keep/remove, as defined in http://www.dc.fi.udc.es/~barreiro/publications/blanco_barreiro_ecir2007.pdf.StorePruningPolicy Pruning policy for removing stored fields from documents.TermPruningPolicy Policy for producing smaller index out of an input index, by examining its terms and removing from the index some or all of their data as follows: all terms of a certain field - seeTermPruningPolicy.pruneAllFieldPostings(String)
all data of a certain term - seeTermPruningPolicy.pruneTermEnum(TermEnum)
all positions of a certain term in a certain document - see #pruneAllPositions(TermPositions, Term) some positions of a certain term in a certain document - see #pruneSomePositions(int, int[], Term)TFTermPruningPolicy Policy for producing smaller index out of an input index, by removing postings data for those terms where their in-document frequency is below a specified threshold.