Class TruncateTokenFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public final class TruncateTokenFilter extends TokenFilter
A token filter for truncating the terms into a specific length (number of codepoints). Fixed prefix truncation, as a stemming method, produces good results on Turkish language. It is reported that F5, using first 5 characters, produced best results in Information Retrieval on Turkish Texts

Since Lucene 10.5, the filter is able to correctly handle codepoints and truncates after the given number of codepoints, no longer producing incomplete surrogate pairs. Use the modern factory method truncateAfterCodePoints(TokenStream, int) to enable this mode. Legacy behaviour is still available with truncateAfterChars(TokenStream, int)