CtrlK

Text Featurizers

Transforms tokenized text into something machine can read
2 Categories: Sparse, Dense
- Sparse: return feature vectors with a lot of missing values (eg. 0s)
- Dense: Contains mostly non-zeroes

MitieFeaturizer

Requires: MitieNLP
Dense featurizer
only SklearnIntentClassifier can use this
For pre-training your own word vectors (need a huge corpus initially)

SpacyFeaturizer

Requires: SpacyNLP

ConveRT Featurizer

Requires: ConveRTTokenizer
Dense featurizer
Short training time
Do not fine-tune parameters
Need intent AND RESPONSE features
- eg. EmbeddingIntentClassifier
- eg. ResponseSelector

RegexFeaturizer

No requirements
Only supports CRFEntityExtractor

CountVectorsFeaturizer

No requirements
Sparse featurizer
bag-of-words representations (intent and response)
To fine-tune: sklearn.feature_extraction

PreviousOptimizing Intent NextTokenizers

Last updated 3 years ago

Was this helpful?