Package org.apache.lucene.analysis
Class Tokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
CharTokenizer
,ClassicTokenizer
,KeywordTokenizer
,StandardTokenizer
,UAX29URLEmailTokenizer
public abstract class Tokenizer extends TokenStream
A Tokenizer is a TokenStream whose input is a Reader.This is an abstract class; subclasses must override
TokenStream.incrementToken()
NOTE: Subclasses overriding
TokenStream.incrementToken()
must callAttributeSource.clearAttributes()
before setting attributes.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
Tokenizer()
Deprecated.useTokenizer(Reader)
instead.protected
Tokenizer(Reader input)
Construct a token stream processing the given input.protected
Tokenizer(AttributeSource source)
Deprecated.useTokenizer(AttributeSource, Reader)
instead.protected
Tokenizer(AttributeSource.AttributeFactory factory)
Deprecated.useTokenizer(AttributeSource.AttributeFactory, Reader)
instead.protected
Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
Construct a token stream processing the given input using the given AttributeFactory.protected
Tokenizer(AttributeSource source, Reader input)
Construct a token stream processing the given input using the given AttributeSource.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
By default, closes the input Reader.protected int
correctOffset(int currentOff)
Return the corrected offset.void
reset(Reader input)
Expert: Reset the tokenizer to a new reader.-
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, incrementToken, reset
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Field Detail
-
input
protected Reader input
The text source for this Tokenizer.
-
-
Constructor Detail
-
Tokenizer
@Deprecated protected Tokenizer()
Deprecated.useTokenizer(Reader)
instead.Construct a tokenizer with null input.
-
Tokenizer
protected Tokenizer(Reader input)
Construct a token stream processing the given input.
-
Tokenizer
@Deprecated protected Tokenizer(AttributeSource.AttributeFactory factory)
Deprecated.useTokenizer(AttributeSource.AttributeFactory, Reader)
instead.Construct a tokenizer with null input using the given AttributeFactory.
-
Tokenizer
protected Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
Construct a token stream processing the given input using the given AttributeFactory.
-
Tokenizer
@Deprecated protected Tokenizer(AttributeSource source)
Deprecated.useTokenizer(AttributeSource, Reader)
instead.Construct a token stream processing the given input using the given AttributeSource.
-
Tokenizer
protected Tokenizer(AttributeSource source, Reader input)
Construct a token stream processing the given input using the given AttributeSource.
-
-
Method Detail
-
close
public void close() throws IOException
By default, closes the input Reader.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classTokenStream
- Throws:
IOException
-
correctOffset
protected final int correctOffset(int currentOff)
Return the corrected offset. Ifinput
is aCharStream
subclass this method callsCharStream.correctOffset(int)
, else returnscurrentOff
.- Parameters:
currentOff
- offset as seen in the output- Returns:
- corrected offset based on the input
- See Also:
CharStream.correctOffset(int)
-
reset
public void reset(Reader input) throws IOException
Expert: Reset the tokenizer to a new reader. Typically, an analyzer (in its reusableTokenStream method) will use this to re-use a previously created tokenizer.- Throws:
IOException
-
-