In this tutorial, we are going to write machine learning program with our own classifier.
1. Steps
First, we need to create a classifier class ScrappyKNN.
Inside ScrappyKNN class, we will have fit and predict functions.
In the example, naive_predict is using a random guess approach to make predictions. closet_predict is using closet distance approach to make predictions.
2. Coding
Let’s head into Python for a programmatic example.
Create a python file classifier.py and write following code to program.
Please read comments carefully to understand the meaning of codes.
""" GoodTecher Machine Learning Coding Tutorial http://72.44.43.28 Machine Learning Coding Tutorial 5. Build Our Own Classifier The program demonstrate how to build our own classifier """ import random from scipy.spatial import distance def euc(a, b): """ Helper funciton to get distance between two points """ return distance.euclidean(a, b) # class of our own Kneighbors classifier class ScrappyKNN(): def fit(self, x_train, y_train): self.x_train = x_train self.y_train = y_train def naive_predict(self, x_test): predictions = [] for row in x_test: label = random.choice(self.y_train) predictions.append(label) return predictions def closet_predict(self, x_test): predictions = [] for row in x_test: label = self.closet(row) predictions.append(label) return predictions def closet(self, row): best_dist = euc(row, self.x_train[0]) best_index = 0 for i in range(1, len(self.x_train)): dist = euc(row, self.x_train[i]) if dist < best_dist: best_dist = dist best_index = i return self.y_train[best_index] # import Iris dataset from sklearn import datasets iris = datasets.load_iris() # x is data, y is true label x = iris.data y = iris.target # split half data as test data, half data as training data from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.5) # use data to train Decision Tree classifier from sklearn import tree tree_classifier = tree.DecisionTreeClassifier() tree_classifier.fit(x_train, y_train) # use data to train kNeighbors classifier from sklearn.neighbors import KNeighborsClassifier kNeighbors_classifier = KNeighborsClassifier() kNeighbors_classifier.fit(x_train, y_train) # use data to train our own kNeighbors classifier my_classifier = ScrappyKNN() my_classifier.fit(x_train, y_train) my_classifier.fit(x_train, y_train) # predict tree_predictions = tree_classifier.predict(x_test) kNeighbors_predictions = kNeighbors_classifier.predict(x_test) my_classifier_naive_predictions = my_classifier.naive_predict(x_test) my_classifier_closet_predictions = my_classifier.closet_predict(x_test) # compare true labels with prediction values to get accuracy score from sklearn.metrics import accuracy_score print("tree classifier accuracy: ", accuracy_score(y_test, tree_predictions)) print("k neighbors classifier accuracy: ", accuracy_score(y_test, kNeighbors_predictions)) print("naive classifier accuracy: ", accuracy_score(y_test, my_classifier_naive_predictions)) print("closet classifier accuracy: ", accuracy_score(y_test, my_classifier_closet_predictions))
python classifier.py