In this tutorial, we are going to write machine learning program with our own classifier.
1. Steps
First, we need to create a classifier class ScrappyKNN.
Inside ScrappyKNN class, we will have fit and predict functions.
In the example, naive_predict is using a random guess approach to make predictions. closet_predict is using closet distance approach to make predictions.
2. Coding
Let’s head into Python for a programmatic example.
Create a python file classifier.py and write following code to program.
Please read comments carefully to understand the meaning of codes.
"""
GoodTecher Machine Learning Coding Tutorial
http://72.44.43.28
Machine Learning Coding Tutorial 5. Build Our Own Classifier
The program demonstrate how to build our own classifier
"""
import random
from scipy.spatial import distance
def euc(a, b):
"""
Helper funciton to get distance between two points
"""
return distance.euclidean(a, b)
# class of our own Kneighbors classifier
class ScrappyKNN():
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
def naive_predict(self, x_test):
predictions = []
for row in x_test:
label = random.choice(self.y_train)
predictions.append(label)
return predictions
def closet_predict(self, x_test):
predictions = []
for row in x_test:
label = self.closet(row)
predictions.append(label)
return predictions
def closet(self, row):
best_dist = euc(row, self.x_train[0])
best_index = 0
for i in range(1, len(self.x_train)):
dist = euc(row, self.x_train[i])
if dist < best_dist:
best_dist = dist
best_index = i
return self.y_train[best_index]
# import Iris dataset
from sklearn import datasets
iris = datasets.load_iris()
# x is data, y is true label
x = iris.data
y = iris.target
# split half data as test data, half data as training data
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.5)
# use data to train Decision Tree classifier
from sklearn import tree
tree_classifier = tree.DecisionTreeClassifier()
tree_classifier.fit(x_train, y_train)
# use data to train kNeighbors classifier
from sklearn.neighbors import KNeighborsClassifier
kNeighbors_classifier = KNeighborsClassifier()
kNeighbors_classifier.fit(x_train, y_train)
# use data to train our own kNeighbors classifier
my_classifier = ScrappyKNN()
my_classifier.fit(x_train, y_train)
my_classifier.fit(x_train, y_train)
# predict
tree_predictions = tree_classifier.predict(x_test)
kNeighbors_predictions = kNeighbors_classifier.predict(x_test)
my_classifier_naive_predictions = my_classifier.naive_predict(x_test)
my_classifier_closet_predictions = my_classifier.closet_predict(x_test)
# compare true labels with prediction values to get accuracy score
from sklearn.metrics import accuracy_score
print("tree classifier accuracy: ", accuracy_score(y_test, tree_predictions))
print("k neighbors classifier accuracy: ", accuracy_score(y_test, kNeighbors_predictions))
print("naive classifier accuracy: ", accuracy_score(y_test, my_classifier_naive_predictions))
print("closet classifier accuracy: ", accuracy_score(y_test, my_classifier_closet_predictions))
python classifier.py