Selecting Data For Modeling

Two Approaches

  • Dot notation

    • Select the "prediction target"

  • Selecting with a column list

    • Selects the "features"

Dot Notation (choosing pred. target)

  • Select column we want to predict (Prediction Target)

# By convention, this is named y
y = model.Column

Choosing "features"

  • Features are columns inputted into model

features = [price, name, clothes]
# By convention, this is named X
X = model[features]

Steps to building and using a model

  • Define

    • What type of model? (eg. decision tree)

  • Fit

    • Capture patterns from model

  • Predict

  • Evaluate

    • Determine accuracy of model's predictions

Example using scikit-learn

If you do not specify random_state number, model may allow for some randomness in model training

Last updated

Was this helpful?