Text Featurizers

  • Transforms tokenized text into something machine can read

  • 2 Categories: Sparse, Dense

    • Sparse: return feature vectors with a lot of missing values (eg. 0s)

    • Dense: Contains mostly non-zeroes

MitieFeaturizer

  • Requires: MitieNLP

  • Dense featurizer

  • only SklearnIntentClassifier can use this

  • For pre-training your own word vectors (need a huge corpus initially)

SpacyFeaturizer

  • Requires: SpacyNLP

ConveRT Featurizer

  • Requires: ConveRTTokenizer

  • Dense featurizer

  • Short training time

  • Do not fine-tune parameters

  • Need intent AND RESPONSE features

    • eg. EmbeddingIntentClassifier

    • eg. ResponseSelector

RegexFeaturizer

  • No requirements

  • Only supports CRFEntityExtractor

CountVectorsFeaturizer

Last updated