Class LaoBreakIterator
- java.lang.Object
-
- com.ibm.icu.text.BreakIterator
-
- org.apache.lucene.analysis.icu.segmentation.LaoBreakIterator
-
- All Implemented Interfaces:
Cloneable
public class LaoBreakIterator extends com.ibm.icu.text.BreakIterator
Syllable iterator for Lao text.This breaks Lao text into syllables according to: Syllabification of Lao Script for Line Breaking Phonpasit Phissamay, Valaxay Dalolay, Chitaphone Chanhsililath, Oulaiphone Silimasak, Sarmad Hussain, Nadir Durrani, Science Technology and Environment Agency, CRULP.
- http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf
- http://www.panl10n.net/Presentations/Cambodia/Phonpassit/LineBreakingAlgo.pdf
Most work is accomplished with RBBI rules, however some additional special logic is needed that cannot be coded in a grammar, and this is implemented here.
For example, what appears to be a final consonant might instead be part of the next syllable. Rules match in a greedy fashion, leaving an illegal sequence that matches no rules.
Take for instance the text ກວ່າດອກ The first rule greedily matches ກວ່າດ, but then ອກ is encountered, which is illegal. What LaoBreakIterator does, according to the paper:
- backtrack and remove the ດ from the last syllable, placing it on the current syllable.
- verify the modified previous syllable (ກວ່າ ) is still legal.
- verify the modified current syllable (ດອກ) is now legal.
- If 2 or 3 fails, then restore the ດ to the last syllable and skip the current character.
Finally, LaoBreakIterator also takes care of the second concern mentioned in the paper. This is the issue of combining marks being in the wrong order (typos).
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Constructor Summary
Constructors Constructor Description LaoBreakIterator(com.ibm.icu.text.RuleBasedBreakIterator rules)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Object
clone()
Clone method.int
current()
int
first()
int
following(int offset)
CharacterIterator
getText()
int
last()
int
next()
int
next(int n)
int
previous()
void
setText(String newText)
void
setText(CharacterIterator text)
-
Methods inherited from class com.ibm.icu.text.BreakIterator
getAvailableLocales, getAvailableULocales, getBreakInstance, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, isBoundary, preceding, registerInstance, registerInstance, unregister
-
-
-
-
Method Detail
-
current
public int current()
- Specified by:
current
in classcom.ibm.icu.text.BreakIterator
-
first
public int first()
- Specified by:
first
in classcom.ibm.icu.text.BreakIterator
-
following
public int following(int offset)
- Specified by:
following
in classcom.ibm.icu.text.BreakIterator
-
getText
public CharacterIterator getText()
- Specified by:
getText
in classcom.ibm.icu.text.BreakIterator
-
last
public int last()
- Specified by:
last
in classcom.ibm.icu.text.BreakIterator
-
next
public int next()
- Specified by:
next
in classcom.ibm.icu.text.BreakIterator
-
next
public int next(int n)
- Specified by:
next
in classcom.ibm.icu.text.BreakIterator
-
previous
public int previous()
- Specified by:
previous
in classcom.ibm.icu.text.BreakIterator
-
setText
public void setText(CharacterIterator text)
- Specified by:
setText
in classcom.ibm.icu.text.BreakIterator
-
setText
public void setText(String newText)
- Overrides:
setText
in classcom.ibm.icu.text.BreakIterator
-
clone
public Object clone()
Clone method. Creates another LaoBreakIterator with the same behavior and current state as this one.- Overrides:
clone
in classcom.ibm.icu.text.BreakIterator
- Returns:
- The clone.
-
-