Class MockTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class MockTokenizer
    extends org.apache.lucene.analysis.Tokenizer
    Tokenizer for testing.

    This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:

    • An internal state-machine is used for checking consumer consistency. These checks can be disabled with setEnableChecks(boolean).
    • For convenience, optionally lowercases terms that it outputs.
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.AttributeFactory
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_MAX_TOKEN_LENGTH  
      static int KEYWORD
      Acts Similar to KeywordTokenizer.
      static int SIMPLE
      Acts like LetterTokenizer.
      static int WHITESPACE
      Acts Similar to WhitespaceTokenizer
      • Fields inherited from class org.apache.lucene.analysis.Tokenizer

        input
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()  
      void end()  
      boolean incrementToken()  
      protected boolean isTokenChar​(int c)  
      protected int normalize​(int c)  
      protected int readCodePoint()  
      void reset()  
      void reset​(Reader input)  
      void setEnableChecks​(boolean enableChecks)
      Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.
      • Methods inherited from class org.apache.lucene.analysis.Tokenizer

        correctOffset
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Field Detail

      • WHITESPACE

        public static final int WHITESPACE
        Acts Similar to WhitespaceTokenizer
        See Also:
        Constant Field Values
      • KEYWORD

        public static final int KEYWORD
        Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...
        See Also:
        Constant Field Values
      • DEFAULT_MAX_TOKEN_LENGTH

        public static final int DEFAULT_MAX_TOKEN_LENGTH
        See Also:
        Constant Field Values
    • Constructor Detail

      • MockTokenizer

        public MockTokenizer​(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                             Reader input,
                             int pattern,
                             boolean lowerCase,
                             int maxTokenLength)
      • MockTokenizer

        public MockTokenizer​(Reader input,
                             int pattern,
                             boolean lowerCase,
                             int maxTokenLength)
      • MockTokenizer

        public MockTokenizer​(Reader input,
                             int pattern,
                             boolean lowerCase)
    • Method Detail

      • incrementToken

        public final boolean incrementToken()
                                     throws IOException
        Specified by:
        incrementToken in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • isTokenChar

        protected boolean isTokenChar​(int c)
      • normalize

        protected int normalize​(int c)
      • reset

        public void reset()
                   throws IOException
        Overrides:
        reset in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • reset

        public void reset​(Reader input)
                   throws IOException
        Overrides:
        reset in class org.apache.lucene.analysis.Tokenizer
        Throws:
        IOException
      • end

        public void end()
                 throws IOException
        Overrides:
        end in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • setEnableChecks

        public void setEnableChecks​(boolean enableChecks)
        Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.