Class ThaiWordFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class ThaiWordFilter
    extends org.apache.lucene.analysis.TokenFilter
    TokenFilter that use BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.

    Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.

    WARNING: this filter may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.

    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static boolean DBBI_AVAILABLE
      True if the JRE supports a working dictionary-based breakiterator for Thai.
      • Fields inherited from class org.apache.lucene.analysis.TokenFilter

        input
    • Constructor Summary

      Constructors 
      Constructor Description
      ThaiWordFilter​(org.apache.lucene.analysis.TokenStream input)
      Deprecated.
      Use the ctor with matchVersion instead!
      ThaiWordFilter​(org.apache.lucene.util.Version matchVersion, org.apache.lucene.analysis.TokenStream input)
      Creates a new ThaiWordFilter with the specified match version.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean incrementToken()  
      void reset()  
      • Methods inherited from class org.apache.lucene.analysis.TokenFilter

        close, end
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Field Detail

      • DBBI_AVAILABLE

        public static final boolean DBBI_AVAILABLE
        True if the JRE supports a working dictionary-based breakiterator for Thai. If this is false, this filter will not work at all!
    • Constructor Detail

      • ThaiWordFilter

        @Deprecated
        public ThaiWordFilter​(org.apache.lucene.analysis.TokenStream input)
        Deprecated.
        Use the ctor with matchVersion instead!
        Creates a new ThaiWordFilter that also lowercases non-thai text.
      • ThaiWordFilter

        public ThaiWordFilter​(org.apache.lucene.util.Version matchVersion,
                              org.apache.lucene.analysis.TokenStream input)
        Creates a new ThaiWordFilter with the specified match version.
    • Method Detail

      • incrementToken

        public boolean incrementToken()
                               throws IOException
        Specified by:
        incrementToken in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • reset

        public void reset()
                   throws IOException
        Overrides:
        reset in class org.apache.lucene.analysis.TokenFilter
        Throws:
        IOException