Tokenizers

WhitespaceTokenizer

  • Whitespace separated

  • "today's" does not split

MitieTokenizer

  • Creates tokens for MITIE entity extractor

SpacyTokenizer

  • requires SpacyNLP

  • supports splitting by quotations

ConveRTTokenizer

Last updated