3

Feature Engineering

Unsolved

Feature Engineering
Supervised

Difficulty: 6 | Problem written by ankita

Problem reported in interviews at

Amazon
Apple
Facebook
Google
Netflix

Sometimes the dataset we get may not have a linear decision boundary. If we use a linear model, it may not get good results. However, we can still get complex decision boundaries such as circular boundaries for our predictions by enriching the dataset with new calculated features from existing features, e.g., polynomial features.

Feature engineering is a very important tool in the toolkit of a data scientist. It requires some domain knowledge to engineer valid and important features. Here, we will engineer some features based on the application of a few mathematical functions.

Apply the following feature engineering to both X_train and X_test:

Add two columns of squares of the two features

Add two columns of log of the two features

Add two columns of exp of the two features

The final feature vector should consist of the concatenation of the original X_train with the squares, log, and exp features, in that order.

Input:

The data set is a sample of the Iris dataset.

You are given as input:

X_train: Two numerical features (sepal length and sepal width in cm)

Y_train: labels for X_train (3 classes)

X_test: Two numerical features (sepal length and sepal width in cm)

Output:

Y_test: prediction on X_test after applying the above-mentioned feature engineering

You just have to complete the function Prediction(X_train, Y_train, X_test) which returns Y_test as a NumPy array for a given X_test.

Hints

Use LogisticRegression(solver='liblinear') to train the model on X_train with engineered features.

The output is a NumPy array.

 

Sample Input:
X_train: [[5.1, 3.5], [4.9, 3.0], [4.7, 3.2], [4.6, 3.1], [5.0, 3.6], [7.0, 3.2], [6.4, 3.2], [6.9, 3.1], [5.5, 2.3], [6.5, 2.8], [6.3, 3.3], [5.8, 2.7], [7.1, 3.0], [6.3, 2.9], [6.5, 3.0]]
Y_train: [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0]
X_test: [[5.4, 3.9], [4.6, 3.4], [5.7, 2.8], [6.3, 3.3], [7.6, 3.0], [4.9, 2.5]]

Expected Output:
[0. 0. 2. 2. 1. 0.]

Input Test Case

Please enter only one test case at a time
numpy has been already imported as np (import numpy as np)

Output