FTS5WrapperTokenizer

The protocol for custom FTS5 tokenizers that wrap another tokenizer.

Types that adopt FTS5WrapperTokenizer don’t have to implement the low-level FTS5Tokenizer.tokenize(context:flags:pText:nText:tokenCallback:).

Instead, they process regular Swift strings.

Here is the implementation for a trivial tokenizer that wraps the built-in ascii tokenizer without any custom processing:

class TrivialAsciiTokenizer : FTS5WrapperTokenizer {
    static let name = "trivial"
    let wrappedTokenizer: FTS5Tokenizer

    init(db: Database, arguments: [String]) throws {
        wrappedTokenizer = try db.makeTokenizer(.ascii())
    }

    func accept(
        token: String,
        flags: FTS5TokenFlags,
        for tokenization: FTS5Tokenization,
        tokenCallback: FTS5WrapperTokenCallback)
        throws
    {
        try tokenCallback(token, flags)
    }
}
  • The wrapped tokenizer

  • Given a token produced by the wrapped tokenizer, notifies customized tokens to the tokenCallback function.

    For example:

    func accept(
        token: String,
        flags: FTS5TokenFlags,
        for tokenization: FTS5Tokenization,
        tokenCallback: FTS5WrapperTokenCallback)
        throws
    {
        // pass through:
        try tokenCallback(token, flags)
    }
    

    When implementing the accept method, there are a two rules to observe:

    1. Errors thrown by the tokenCallback function must not be caught.

    2. The input flags should be given unmodified to the tokenCallback function, unless you union it with the .colocated flag when the tokenizer produces synonyms (see https://www.sqlite.org/fts5.html#synonym_support).