Improving Nlu Model

Pipelines

2.pretrained_embeddings_convert 3.pretrained_embeddings_spacy

supervised_embeddings

  • Uses whitespace for tokenization

Default Components:

language: "en"

pipeline:
- name: "WhitespaceTokenizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
- name: "CountVectorsFeaturizer"
  analyzer: "char_wb"
  min_ngram: 1
  max_ngram: 4
- name: "EmbeddingIntentClassifier"
  • eg. if chosen language is not whitespace-tokenized, replace WhitespaceTokenizer with your own tokenizer

  • Note: uses 2 CountVectorsFeaturizer

    • 1st one: featurizes text based on words

    • 2nd one: Featurizes based on character n-grams, preserving word boundaries

pretrained_embeddings_convert

  • pretrained sentence encoding model ConveRT to extract vector representations of complete user utterance as a whole

pretrained_embeddings_spacy

  • pre-trained word vectors from either GloVe or fastText

More about Spacy Models

MITIE

Last updated

Was this helpful?