Class ShingleAnalyzerWrapper
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class ShingleAnalyzerWrapper extends org.apache.lucene.analysis.Analyzer
A ShingleAnalyzerWrapper wraps aShingleFilter
around anotherAnalyzer
.A shingle is another name for a token based n-gram.
-
-
Constructor Summary
Constructors Constructor Description ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer)
ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int maxShingleSize)
ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles)
Creates a new ShingleAnalyzerWrapperShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion)
WrapsStandardAnalyzer
.ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion, int minShingleSize, int maxShingleSize)
WrapsStandardAnalyzer
.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description int
getMaxShingleSize()
The max shingle (token ngram) sizeint
getMinShingleSize()
The min shingle (token ngram) sizeString
getTokenSeparator()
boolean
isOutputUnigrams()
boolean
isOutputUnigramsIfNoShingles()
org.apache.lucene.analysis.TokenStream
reusableTokenStream(String fieldName, Reader reader)
void
setMaxShingleSize(int maxShingleSize)
Deprecated.Setting maxShingleSize after Analyzer instantiation prevents reuse.void
setMinShingleSize(int minShingleSize)
Deprecated.Setting minShingleSize after Analyzer instantiation prevents reuse.void
setOutputUnigrams(boolean outputUnigrams)
Deprecated.Setting outputUnigrams after Analyzer instantiation prevents reuse.void
setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
Deprecated.Setting outputUnigramsIfNoShingles after Analyzer instantiation prevents reuse.void
setTokenSeparator(String tokenSeparator)
Deprecated.Setting tokenSeparator after Analyzer instantiation prevents reuse.org.apache.lucene.analysis.TokenStream
tokenStream(String fieldName, Reader reader)
-
-
-
Constructor Detail
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer)
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int maxShingleSize)
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles)
Creates a new ShingleAnalyzerWrapper- Parameters:
defaultAnalyzer
- Analyzer whose TokenStream is to be filteredminShingleSize
- Min shingle (token ngram) sizemaxShingleSize
- Max shingle sizetokenSeparator
- Used to separate input stream tokens in output shinglesoutputUnigrams
- Whether or not the filter shall pass the original tokens to the output streamoutputUnigramsIfNoShingles
- Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion)
WrapsStandardAnalyzer
.
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion, int minShingleSize, int maxShingleSize)
WrapsStandardAnalyzer
.
-
-
Method Detail
-
getMaxShingleSize
public int getMaxShingleSize()
The max shingle (token ngram) size- Returns:
- The max shingle (token ngram) size
-
setMaxShingleSize
@Deprecated public void setMaxShingleSize(int maxShingleSize)
Deprecated.Setting maxShingleSize after Analyzer instantiation prevents reuse. Confgure maxShingleSize during construction.Set the maximum size of output shingles (default: 2)- Parameters:
maxShingleSize
- max shingle size
-
getMinShingleSize
public int getMinShingleSize()
The min shingle (token ngram) size- Returns:
- The min shingle (token ngram) size
-
setMinShingleSize
@Deprecated public void setMinShingleSize(int minShingleSize)
Deprecated.Setting minShingleSize after Analyzer instantiation prevents reuse. Confgure minShingleSize during construction.Set the min shingle size (default: 2).
This method requires that the passed in minShingleSize is not greater than maxShingleSize, so make sure that maxShingleSize is set before calling this method.
- Parameters:
minShingleSize
- min size of output shingles
-
getTokenSeparator
public String getTokenSeparator()
-
setTokenSeparator
@Deprecated public void setTokenSeparator(String tokenSeparator)
Deprecated.Setting tokenSeparator after Analyzer instantiation prevents reuse. Confgure tokenSeparator during construction.Sets the string to use when joining adjacent tokens to form a shingle- Parameters:
tokenSeparator
- used to separate input stream tokens in output shingles
-
isOutputUnigrams
public boolean isOutputUnigrams()
-
setOutputUnigrams
@Deprecated public void setOutputUnigrams(boolean outputUnigrams)
Deprecated.Setting outputUnigrams after Analyzer instantiation prevents reuse. Confgure outputUnigrams during construction.Shall the filter pass the original tokens (the "unigrams") to the output stream?- Parameters:
outputUnigrams
- Whether or not the filter shall pass the original tokens to the output stream
-
isOutputUnigramsIfNoShingles
public boolean isOutputUnigramsIfNoShingles()
-
setOutputUnigramsIfNoShingles
@Deprecated public void setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
Deprecated.Setting outputUnigramsIfNoShingles after Analyzer instantiation prevents reuse. Confgure outputUnigramsIfNoShingles during construction.Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)
Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
- Parameters:
outputUnigramsIfNoShingles
- Whether or not to output a single unigram when no shingles are available.
-
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
- Specified by:
tokenStream
in classorg.apache.lucene.analysis.Analyzer
-
reusableTokenStream
public org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
- Overrides:
reusableTokenStream
in classorg.apache.lucene.analysis.Analyzer
- Throws:
IOException
-
-