[{"data":1,"prerenderedAt":1178},["ShallowReactive",2],{"article-ml-knn-image-classification":3,"surround":1172},{"id":4,"title":5,"author":6,"body":7,"createdAt":1157,"description":1158,"extension":1159,"meta":1160,"navigation":1161,"path":1162,"published":147,"seo":1163,"stem":1164,"tags":1165,"thumbnail":1170,"updatedAt":1157,"__hash__":1171},"article/article/ml-knn-image-classification.md","K-NN Image Classification Using Scikit-Learn","John Kosmetos",{"type":8,"value":9,"toc":1141},"minimark",[10,15,36,40,43,46,49,53,64,69,72,82,86,89,93,96,100,103,107,110,124,128,131,705,709,715,796,800,803,812,852,855,899,902,936,943,946,950,953,975,981,984,1113,1119,1126,1137],[11,12,14],"h2",{"id":13},"overview","Overview",[16,17,18,19,25,26,30,31,35],"p",{},"In this tutorial I'm going to go over the basics of image classification using a very popular ML algorithm, namely: K-Nearest Neighbour. I'm going to use ",[20,21,24],"a",{"href":22,"target":23},"https://scikit-learn.org/stable/index.html","_blank","Scikit-Learn","'s classification implementation, and train it on ",[20,27,29],{"href":28,"target":23},"https://www.openml.org/search?type=data&status=active&id=554","MNIST"," (Handwritten digits) data downloaded from ",[20,32,34],{"href":33,"target":23},"https://www.openml.org/","OpenML",", after which we'll check its accuracy and spot-check a few classifications to see if it works.",[11,37,39],{"id":38},"what-even-is-k-nn","What Even Is K-NN?!",[16,41,42],{},"Good question! K-NN is a type of supervised machine learning algorithm that assumes \"sameness\" based on proximity of data points in a given feature space, which is just a fancy way of saying that it groups data that sits close together.",[16,44,45],{},"It is considered a lazy algorithm, in that it doesn't produce a model per se, but instead stores data / distances in memory, and predicts on the fly, so it just throws everything into a dict or array and keeps growing it (good for small datasets, bad for biggens).",[16,47,48],{},"It is usually one of the first models you learn when stepping into the world of data science due to it's simplicity. There are a few different \"flavours\" of Nearest Neighbour algorithm, but for the purposes of this post, I'm going to focus on the most straightforward of the bunch, plain old Standard K-NN, but another widely used variant is Weighted K-NN, which I'll cover separately.",[11,50,52],{"id":51},"how-it-works","How It Works",[16,54,55,56,60,61,63],{},"In its simplest form a K-NN model only needs two parameters to function, the ",[57,58,59],"code",{},"k"," value, and the distance metric. The ",[57,62,59],{}," value represents the number of neighbours to take into account when a new data point is added to the feature space, and the distance metric stipulates which measure should be used to calculate the distance between data points. Euclidean distance is used the vast majority of the time, but Manhattan and Minkowski distance are also widely used.",[65,66,68],"h3",{"id":67},"for-classification","For Classification",[16,70,71],{},"In a classification task, the training data will consist of features accompanied by labels. K-NN can handle both singular and multi-label classifications, but for the sake of simplicity, we'll only focus on singularly labelled data.",[16,73,74,75,78,79,81],{},"Let's say that ",[57,76,77],{},"k=3",", and the distance metric is set to euclidean distance; when a new data point is added to the feature space, the algorithm first calculates the euclidean distance from the new unseen point to every other point present in the training data, after which it grabs the ",[57,80,59],{}," closest ones, which in this case is 3, checks their labels, assigns the most prevalent one to the new point, and voila, it is done!",[65,83,85],{"id":84},"for-regression","For Regression",[16,87,88],{},"With regression tasks, instead of assigning a label, training feature values are used to calculate the median or mean (depending on the task at hand) value to \"predict\" the outcome of an unseen data point. So as above, assuming a simple implementation, the values of the 3 closest neighbours are merely averaged and returned, easy peasy!",[11,90,92],{"id":91},"common-use-cases","Common Use-Cases",[16,94,95],{},"Nearest neighbour algorithms are particularly useful for image recognition, handwriting recognition, spam detection and even financial forecasts (regression). In this post I'm going to use image recognition as an example, specifically handwritten digit recognition.",[11,97,99],{"id":98},"example","Example",[16,101,102],{},"Time to get our hands dirty, let's jump straight into an example.",[65,104,106],{"id":105},"requirements","Requirements",[16,108,109],{},"Before we start, make sure you have your python environment set up and ready to go with the following libraries:",[111,112,113,118],"ul",{},[114,115,116],"li",{},[20,117,24],{"href":22,"target":23},[114,119,120],{},[20,121,123],{"href":122,"target":23},"https://matplotlib.org/","Matplotlib",[65,125,127],{"id":126},"utility-functions-imports","Utility Functions & Imports",[16,129,130],{},"To make life a little easier, here are a few utility functions to help with plotting and displaying data, along with all the necessary imports.",[132,133,138],"pre",{"className":134,"code":135,"language":136,"meta":137,"style":137},"language-python shiki shiki-themes plastic","\nimport matplotlib.pyplot as plt\nfrom sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import fetch_openml\nfrom sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n\ndef plot_digits(digits, predictions, classifier_name=\"\"):\n    \n    # Generate subplots\n    _, axes = plt.subplots(nrows=1, ncols=8, figsize=(15, 2))\n    \n    for ax, image, prediction in zip(axes, digits, predictions):\n        \n        # Turn off the axis display\n        ax.set_axis_off()\n        \n        # Reconstitute the image\n        image = image.reshape(28, 28)\n        \n        # Display image\n        ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n        \n        # Set the title\n        ax.set_title(f\"Prediction: {prediction}\")\n    \n    plt.suptitle(f'First 8 {classifier_name} Predictions', fontsize=14)\n\n    plt.tight_layout(rect=[0, 0.03, 1, 0.95]) # Adjust layout to prevent title overlap\n\n    # Show the plot    \n    plt.show()\n\ndef generate_confusion_matrix(true_labels, predicted_labels, labels):\n    \n    # calculate the confusion matrix\n    cm = confusion_matrix(true_labels, predicted_labels, labels=labels)\n    cm_display = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)\n    cm_display.plot()\n\n    # Set the title\n    plt.title('Confusion Matrix')\n\n    # Show the plot\n    plt.show()\n","python","",[57,139,140,149,166,180,193,206,219,224,262,268,275,325,330,348,354,360,366,371,377,398,403,409,439,444,450,475,480,512,517,554,559,565,571,576,601,606,612,630,656,662,667,673,689,694,700],{"__ignoreMap":137},[141,142,145],"span",{"class":143,"line":144},"line",1,[141,146,148],{"emptyLinePlaceholder":147},true,"\n",[141,150,152,156,160,163],{"class":143,"line":151},2,[141,153,155],{"class":154},"sVyAn","import",[141,157,159],{"class":158},"sGSqi"," matplotlib.pyplot ",[141,161,162],{"class":154},"as",[141,164,165],{"class":158}," plt\n",[141,167,169,172,175,177],{"class":143,"line":168},3,[141,170,171],{"class":154},"from",[141,173,174],{"class":158}," sklearn.neighbors ",[141,176,155],{"class":154},[141,178,179],{"class":158}," KNeighborsClassifier\n",[141,181,183,185,188,190],{"class":143,"line":182},4,[141,184,171],{"class":154},[141,186,187],{"class":158}," sklearn.model_selection ",[141,189,155],{"class":154},[141,191,192],{"class":158}," train_test_split\n",[141,194,196,198,201,203],{"class":143,"line":195},5,[141,197,171],{"class":154},[141,199,200],{"class":158}," sklearn.datasets ",[141,202,155],{"class":154},[141,204,205],{"class":158}," fetch_openml\n",[141,207,209,211,214,216],{"class":143,"line":208},6,[141,210,171],{"class":154},[141,212,213],{"class":158}," sklearn.metrics ",[141,215,155],{"class":154},[141,217,218],{"class":158}," confusion_matrix, ConfusionMatrixDisplay\n",[141,220,222],{"class":143,"line":221},7,[141,223,148],{"emptyLinePlaceholder":147},[141,225,227,231,235,238,242,245,248,250,253,256,259],{"class":143,"line":226},8,[141,228,230],{"class":229},"sVbv2","def",[141,232,234],{"class":233},"sJix2"," plot_digits",[141,236,237],{"class":158},"(",[141,239,241],{"class":240},"sVs6v","digits",[141,243,244],{"class":158},", ",[141,246,247],{"class":240},"predictions",[141,249,244],{"class":158},[141,251,252],{"class":240},"classifier_name",[141,254,255],{"class":154},"=",[141,257,258],{"class":158},"\"\"",[141,260,261],{"class":158},"):\n",[141,263,265],{"class":143,"line":264},9,[141,266,267],{"class":158},"    \n",[141,269,271],{"class":143,"line":270},10,[141,272,274],{"class":273},"ssUfO","    # Generate subplots\n",[141,276,278,281,283,286,289,291,295,297,300,302,305,307,310,312,314,317,319,322],{"class":143,"line":277},11,[141,279,280],{"class":158},"    _, axes ",[141,282,255],{"class":154},[141,284,285],{"class":158}," plt.subplots(",[141,287,288],{"class":240},"nrows",[141,290,255],{"class":154},[141,292,294],{"class":293},"sjrmR","1",[141,296,244],{"class":158},[141,298,299],{"class":240},"ncols",[141,301,255],{"class":154},[141,303,304],{"class":293},"8",[141,306,244],{"class":158},[141,308,309],{"class":240},"figsize",[141,311,255],{"class":154},[141,313,237],{"class":158},[141,315,316],{"class":293},"15",[141,318,244],{"class":158},[141,320,321],{"class":293},"2",[141,323,324],{"class":158},"))\n",[141,326,328],{"class":143,"line":327},12,[141,329,267],{"class":158},[141,331,333,336,339,342,345],{"class":143,"line":332},13,[141,334,335],{"class":154},"    for",[141,337,338],{"class":158}," ax, image, prediction ",[141,340,341],{"class":154},"in",[141,343,344],{"class":233}," zip",[141,346,347],{"class":158},"(axes, digits, predictions):\n",[141,349,351],{"class":143,"line":350},14,[141,352,353],{"class":158},"        \n",[141,355,357],{"class":143,"line":356},15,[141,358,359],{"class":273},"        # Turn off the axis display\n",[141,361,363],{"class":143,"line":362},16,[141,364,365],{"class":158},"        ax.set_axis_off()\n",[141,367,369],{"class":143,"line":368},17,[141,370,353],{"class":158},[141,372,374],{"class":143,"line":373},18,[141,375,376],{"class":273},"        # Reconstitute the image\n",[141,378,380,383,385,388,391,393,395],{"class":143,"line":379},19,[141,381,382],{"class":158},"        image ",[141,384,255],{"class":154},[141,386,387],{"class":158}," image.reshape(",[141,389,390],{"class":293},"28",[141,392,244],{"class":158},[141,394,390],{"class":293},[141,396,397],{"class":158},")\n",[141,399,401],{"class":143,"line":400},20,[141,402,353],{"class":158},[141,404,406],{"class":143,"line":405},21,[141,407,408],{"class":273},"        # Display image\n",[141,410,412,415,418,420,423,426,428,431,435,437],{"class":143,"line":411},22,[141,413,414],{"class":158},"        ax.imshow(image, ",[141,416,417],{"class":240},"cmap",[141,419,255],{"class":154},[141,421,422],{"class":158},"plt.cm.gray_r, ",[141,424,425],{"class":240},"interpolation",[141,427,255],{"class":154},[141,429,430],{"class":158},"\"",[141,432,434],{"class":433},"subq3","nearest",[141,436,430],{"class":158},[141,438,397],{"class":158},[141,440,442],{"class":143,"line":441},23,[141,443,353],{"class":158},[141,445,447],{"class":143,"line":446},24,[141,448,449],{"class":273},"        # Set the title\n",[141,451,453,456,459,462,465,468,471,473],{"class":143,"line":452},25,[141,454,455],{"class":158},"        ax.set_title(",[141,457,458],{"class":229},"f",[141,460,461],{"class":433},"\"Prediction: ",[141,463,464],{"class":293},"{",[141,466,467],{"class":158},"prediction",[141,469,470],{"class":293},"}",[141,472,430],{"class":433},[141,474,397],{"class":158},[141,476,478],{"class":143,"line":477},26,[141,479,267],{"class":158},[141,481,483,486,488,491,493,495,497,500,502,505,507,510],{"class":143,"line":482},27,[141,484,485],{"class":158},"    plt.suptitle(",[141,487,458],{"class":229},[141,489,490],{"class":433},"'First 8 ",[141,492,464],{"class":293},[141,494,252],{"class":158},[141,496,470],{"class":293},[141,498,499],{"class":433}," Predictions'",[141,501,244],{"class":158},[141,503,504],{"class":240},"fontsize",[141,506,255],{"class":154},[141,508,509],{"class":293},"14",[141,511,397],{"class":158},[141,513,515],{"class":143,"line":514},28,[141,516,148],{"emptyLinePlaceholder":147},[141,518,520,523,526,528,531,534,536,539,541,543,545,548,551],{"class":143,"line":519},29,[141,521,522],{"class":158},"    plt.tight_layout(",[141,524,525],{"class":240},"rect",[141,527,255],{"class":154},[141,529,530],{"class":158},"[",[141,532,533],{"class":293},"0",[141,535,244],{"class":158},[141,537,538],{"class":293},"0.03",[141,540,244],{"class":158},[141,542,294],{"class":293},[141,544,244],{"class":158},[141,546,547],{"class":293},"0.95",[141,549,550],{"class":158},"]) ",[141,552,553],{"class":273},"# Adjust layout to prevent title overlap\n",[141,555,557],{"class":143,"line":556},30,[141,558,148],{"emptyLinePlaceholder":147},[141,560,562],{"class":143,"line":561},31,[141,563,564],{"class":273},"    # Show the plot    \n",[141,566,568],{"class":143,"line":567},32,[141,569,570],{"class":158},"    plt.show()\n",[141,572,574],{"class":143,"line":573},33,[141,575,148],{"emptyLinePlaceholder":147},[141,577,579,581,584,586,589,591,594,596,599],{"class":143,"line":578},34,[141,580,230],{"class":229},[141,582,583],{"class":233}," generate_confusion_matrix",[141,585,237],{"class":158},[141,587,588],{"class":240},"true_labels",[141,590,244],{"class":158},[141,592,593],{"class":240},"predicted_labels",[141,595,244],{"class":158},[141,597,598],{"class":240},"labels",[141,600,261],{"class":158},[141,602,604],{"class":143,"line":603},35,[141,605,267],{"class":158},[141,607,609],{"class":143,"line":608},36,[141,610,611],{"class":273},"    # calculate the confusion matrix\n",[141,613,615,618,620,623,625,627],{"class":143,"line":614},37,[141,616,617],{"class":158},"    cm ",[141,619,255],{"class":154},[141,621,622],{"class":158}," confusion_matrix(true_labels, predicted_labels, ",[141,624,598],{"class":240},[141,626,255],{"class":154},[141,628,629],{"class":158},"labels)\n",[141,631,633,636,638,641,644,646,649,652,654],{"class":143,"line":632},38,[141,634,635],{"class":158},"    cm_display ",[141,637,255],{"class":154},[141,639,640],{"class":158}," ConfusionMatrixDisplay(",[141,642,643],{"class":240},"confusion_matrix",[141,645,255],{"class":154},[141,647,648],{"class":158},"cm, ",[141,650,651],{"class":240},"display_labels",[141,653,255],{"class":154},[141,655,629],{"class":158},[141,657,659],{"class":143,"line":658},39,[141,660,661],{"class":158},"    cm_display.plot()\n",[141,663,665],{"class":143,"line":664},40,[141,666,148],{"emptyLinePlaceholder":147},[141,668,670],{"class":143,"line":669},41,[141,671,672],{"class":273},"    # Set the title\n",[141,674,676,679,682,685,687],{"class":143,"line":675},42,[141,677,678],{"class":158},"    plt.title(",[141,680,681],{"class":158},"'",[141,683,684],{"class":433},"Confusion Matrix",[141,686,681],{"class":158},[141,688,397],{"class":158},[141,690,692],{"class":143,"line":691},43,[141,693,148],{"emptyLinePlaceholder":147},[141,695,697],{"class":143,"line":696},44,[141,698,699],{"class":273},"    # Show the plot\n",[141,701,703],{"class":143,"line":702},45,[141,704,570],{"class":158},[65,706,708],{"id":707},"data-pre-processing","Data Pre-processing",[16,710,711,712,714],{},"We're going to download our dataset from ",[20,713,34],{"href":33,"target":23}," and do some basic pre-processing, as follows:",[132,716,718],{"className":134,"code":717,"language":136,"meta":137,"style":137},"# Load the MNIST dataset from OpenML\nmnist = datasets.fetch_openml('mnist_784')\n\n# Get the first 1000 records for now\nall_features = mnist.data.to_numpy()[:1000] # The image data\nall_labels = mnist.target.astype(int).to_numpy()[:1000] # The corresponding labels, (0-9)\n",[57,719,720,725,744,748,753,772],{"__ignoreMap":137},[141,721,722],{"class":143,"line":144},[141,723,724],{"class":273},"# Load the MNIST dataset from OpenML\n",[141,726,727,730,732,735,737,740,742],{"class":143,"line":151},[141,728,729],{"class":158},"mnist ",[141,731,255],{"class":154},[141,733,734],{"class":158}," datasets.fetch_openml(",[141,736,681],{"class":158},[141,738,739],{"class":433},"mnist_784",[141,741,681],{"class":158},[141,743,397],{"class":158},[141,745,746],{"class":143,"line":168},[141,747,148],{"emptyLinePlaceholder":147},[141,749,750],{"class":143,"line":182},[141,751,752],{"class":273},"# Get the first 1000 records for now\n",[141,754,755,758,760,763,766,769],{"class":143,"line":195},[141,756,757],{"class":158},"all_features ",[141,759,255],{"class":154},[141,761,762],{"class":158}," mnist.data.to_numpy()[:",[141,764,765],{"class":293},"1000",[141,767,768],{"class":158},"] ",[141,770,771],{"class":273},"# The image data\n",[141,773,774,777,779,782,786,789,791,793],{"class":143,"line":208},[141,775,776],{"class":158},"all_labels ",[141,778,255],{"class":154},[141,780,781],{"class":158}," mnist.target.astype(",[141,783,785],{"class":784},"sU0A5","int",[141,787,788],{"class":158},").to_numpy()[:",[141,790,765],{"class":293},[141,792,768],{"class":158},[141,794,795],{"class":273},"# The corresponding labels, (0-9)\n",[65,797,799],{"id":798},"training-classification","Training & Classification",[16,801,802],{},"Before training the classifier, it's a common practice to split your data into training and testing data. This is done to ensure that the model tests against \"unseen\" instances, and not on data its just been trained on, which would lead to unreliable accuracy scores.",[16,804,805,807,808,811],{},[20,806,24],{"href":22,"target":23}," has built-in method to do just that called ",[57,809,810],{},"train_test_split",", so let's segment the data into a 80%/20% train/test set as follows.",[132,813,815],{"className":134,"code":814,"language":136,"meta":137,"style":137},"# Split the dataset into training and testing sets\ntraining_features, testing_features, training_labels, testing_labels = train_test_split(all_features, all_labels, test_size=0.2, random_state=42)\n",[57,816,817,822],{"__ignoreMap":137},[141,818,819],{"class":143,"line":144},[141,820,821],{"class":273},"# Split the dataset into training and testing sets\n",[141,823,824,827,829,832,835,837,840,842,845,847,850],{"class":143,"line":151},[141,825,826],{"class":158},"training_features, testing_features, training_labels, testing_labels ",[141,828,255],{"class":154},[141,830,831],{"class":158}," train_test_split(all_features, all_labels, ",[141,833,834],{"class":240},"test_size",[141,836,255],{"class":154},[141,838,839],{"class":293},"0.2",[141,841,244],{"class":158},[141,843,844],{"class":240},"random_state",[141,846,255],{"class":154},[141,848,849],{"class":293},"42",[141,851,397],{"class":158},[16,853,854],{},"Once the data's been split, we're ready to instantiate our model, and start the fitting (training) process.",[132,856,858],{"className":134,"code":857,"language":136,"meta":137,"style":137},"# Instantiate KNN classifier\nknn_model = KNeighborsClassifier(n_neighbors=3)\n\n# Train (fit) the model\nknn_model.fit(training_features, training_labels)\n",[57,859,860,865,885,889,894],{"__ignoreMap":137},[141,861,862],{"class":143,"line":144},[141,863,864],{"class":273},"# Instantiate KNN classifier\n",[141,866,867,870,872,875,878,880,883],{"class":143,"line":151},[141,868,869],{"class":158},"knn_model ",[141,871,255],{"class":154},[141,873,874],{"class":158}," KNeighborsClassifier(",[141,876,877],{"class":240},"n_neighbors",[141,879,255],{"class":154},[141,881,882],{"class":293},"3",[141,884,397],{"class":158},[141,886,887],{"class":143,"line":168},[141,888,148],{"emptyLinePlaceholder":147},[141,890,891],{"class":143,"line":182},[141,892,893],{"class":273},"# Train (fit) the model\n",[141,895,896],{"class":143,"line":195},[141,897,898],{"class":158},"knn_model.fit(training_features, training_labels)\n",[16,900,901],{},"Depending on the amount of data and the beefiness of your machine you're training on, this might take some time, but when its done, the only thing left to do is to test your newly trained KNN model on some unseen data, so let's do it,",[132,903,905],{"className":134,"code":904,"language":136,"meta":137,"style":137},"# Make predictions\npredicted_labels = knn_model.predict(testing_features)\n\n# Spot check a few\nplot_digits(testing_features, predicted_labels)\n",[57,906,907,912,922,926,931],{"__ignoreMap":137},[141,908,909],{"class":143,"line":144},[141,910,911],{"class":273},"# Make predictions\n",[141,913,914,917,919],{"class":143,"line":151},[141,915,916],{"class":158},"predicted_labels ",[141,918,255],{"class":154},[141,920,921],{"class":158}," knn_model.predict(testing_features)\n",[141,923,924],{"class":143,"line":168},[141,925,148],{"emptyLinePlaceholder":147},[141,927,928],{"class":143,"line":182},[141,929,930],{"class":273},"# Spot check a few\n",[141,932,933],{"class":143,"line":195},[141,934,935],{"class":158},"plot_digits(testing_features, predicted_labels)\n",[16,937,938],{},[939,940],"img",{"alt":941,"src":942},"MNIST K-NN Predictions","/images/knn-mnist-predictions.png",[16,944,945],{},"Looks like it correctly classified 75% of the unseen spot-checked digits, and considering we only used 800 training records, that's not bad at all! Some more training, and this model will be humming along nicely.",[65,947,949],{"id":948},"evaluation","Evaluation",[16,951,952],{},"We're going to use a confusion matrix to check the model's predicted vs true values, to ascertain whether it's any good at generalising to unseen data.",[132,954,956],{"className":134,"code":955,"language":136,"meta":137,"style":137},"# Generate the confusion matrix using the utility function included above\ngenerate_confusion_matrix(testing_labels, predictions, labels=knn_model.classes_)\n",[57,957,958,963],{"__ignoreMap":137},[141,959,960],{"class":143,"line":144},[141,961,962],{"class":273},"# Generate the confusion matrix using the utility function included above\n",[141,964,965,968,970,972],{"class":143,"line":151},[141,966,967],{"class":158},"generate_confusion_matrix(testing_labels, predictions, ",[141,969,598],{"class":240},[141,971,255],{"class":154},[141,973,974],{"class":158},"knn_model.classes_)\n",[16,976,977],{},[939,978],{"alt":979,"src":980},"MNIST K-NN Confusion Matrix","/images/knn-mnist-confusion-matrix.png",[16,982,983],{},"The actual label is denoted via the y-axis, and our model's predicted label via the x-axis. By the looks of it, the majority of the predictions are correct, but let's check one last thing, the overall accuracy.",[132,985,987],{"className":134,"code":986,"language":136,"meta":137,"style":137},"# Manually calculate accuracy\ncorrect_predictions = 0\nfor i in range(len(testing_labels)):\n    if predicted_labels[i] == testing_labels[i]:\n        correct_predictions += 1\n\n# Correct predictions over total number of testing labels\naccuracy = correct_predictions / len(testing_labels)\n\n# Print the output\nprint(f\"KNN Accuracy: {accuracy:.2f}\")\n",[57,988,989,994,1004,1025,1039,1050,1054,1059,1078,1082,1087],{"__ignoreMap":137},[141,990,991],{"class":143,"line":144},[141,992,993],{"class":273},"# Manually calculate accuracy\n",[141,995,996,999,1001],{"class":143,"line":151},[141,997,998],{"class":158},"correct_predictions ",[141,1000,255],{"class":154},[141,1002,1003],{"class":293}," 0\n",[141,1005,1006,1009,1012,1014,1017,1019,1022],{"class":143,"line":168},[141,1007,1008],{"class":154},"for",[141,1010,1011],{"class":158}," i ",[141,1013,341],{"class":154},[141,1015,1016],{"class":233}," range",[141,1018,237],{"class":158},[141,1020,1021],{"class":233},"len",[141,1023,1024],{"class":158},"(testing_labels)):\n",[141,1026,1027,1030,1033,1036],{"class":143,"line":182},[141,1028,1029],{"class":154},"    if",[141,1031,1032],{"class":158}," predicted_labels[i] ",[141,1034,1035],{"class":154},"==",[141,1037,1038],{"class":158}," testing_labels[i]:\n",[141,1040,1041,1044,1047],{"class":143,"line":195},[141,1042,1043],{"class":158},"        correct_predictions ",[141,1045,1046],{"class":154},"+=",[141,1048,1049],{"class":293}," 1\n",[141,1051,1052],{"class":143,"line":208},[141,1053,148],{"emptyLinePlaceholder":147},[141,1055,1056],{"class":143,"line":221},[141,1057,1058],{"class":273},"# Correct predictions over total number of testing labels\n",[141,1060,1061,1064,1066,1069,1072,1075],{"class":143,"line":226},[141,1062,1063],{"class":158},"accuracy ",[141,1065,255],{"class":154},[141,1067,1068],{"class":158}," correct_predictions ",[141,1070,1071],{"class":154},"/",[141,1073,1074],{"class":233}," len",[141,1076,1077],{"class":158},"(testing_labels)\n",[141,1079,1080],{"class":143,"line":264},[141,1081,148],{"emptyLinePlaceholder":147},[141,1083,1084],{"class":143,"line":270},[141,1085,1086],{"class":273},"# Print the output\n",[141,1088,1089,1092,1094,1096,1099,1101,1104,1107,1109,1111],{"class":143,"line":277},[141,1090,1091],{"class":233},"print",[141,1093,237],{"class":158},[141,1095,458],{"class":229},[141,1097,1098],{"class":433},"\"KNN Accuracy: ",[141,1100,464],{"class":293},[141,1102,1103],{"class":158},"accuracy",[141,1105,1106],{"class":229},":.2f",[141,1108,470],{"class":293},[141,1110,430],{"class":433},[141,1112,397],{"class":158},[16,1114,1115,1116,1118],{},"As shown above, we loop through and compare the predicted and true labels for all test data, then calculate the average. Although ",[20,1117,24],{"href":22,"target":23}," has built in functionality to calculate this automatically, I wanted to do it manually to show you how easy this is to work with.",[16,1120,1121,1122,1125],{},"When all's said and done, we end up with a 86% accurate model trained on just 800 records, and tested against 200 unseen digits: ",[57,1123,1124],{},"KNN Accuracy: 0.86",".",[16,1127,1128,1129,1132,1133,1136],{},"Accuracy is the simplest of all the classification metrics, but ",[57,1130,1131],{},"recall"," and ",[57,1134,1135],{},"f1-score"," should be used as well to get a holistic view of your classifier's performance.",[1138,1139,1140],"style",{},"html pre.shiki code .sVyAn, html code.shiki .sVyAn{--shiki-default:#E06C75}html pre.shiki code .sGSqi, html code.shiki .sGSqi{--shiki-default:#A9B2C3}html pre.shiki code .sVbv2, html code.shiki .sVbv2{--shiki-default:#61AFEF}html pre.shiki code .sJix2, html code.shiki .sJix2{--shiki-default:#B57EDC}html pre.shiki code .sVs6v, html code.shiki .sVs6v{--shiki-default:#C6CCD7}html pre.shiki code .ssUfO, html code.shiki .ssUfO{--shiki-default:#5F6672;--shiki-default-font-style:italic}html pre.shiki code .sjrmR, html code.shiki .sjrmR{--shiki-default:#56B6C2}html pre.shiki code .subq3, html code.shiki .subq3{--shiki-default:#98C379}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sU0A5, html code.shiki .sU0A5{--shiki-default:#E5C07B}",{"title":137,"searchDepth":151,"depth":151,"links":1142},[1143,1144,1145,1149,1150],{"id":13,"depth":151,"text":14},{"id":38,"depth":151,"text":39},{"id":51,"depth":151,"text":52,"children":1146},[1147,1148],{"id":67,"depth":168,"text":68},{"id":84,"depth":168,"text":85},{"id":91,"depth":151,"text":92},{"id":98,"depth":151,"text":99,"children":1151},[1152,1153,1154,1155,1156],{"id":105,"depth":168,"text":106},{"id":126,"depth":168,"text":127},{"id":707,"depth":168,"text":708},{"id":798,"depth":168,"text":799},{"id":948,"depth":168,"text":949},"2025-07-07T00:00:00.000Z","A basic introduction to using Scikit-Learn's K-Nearest Neighbour machine learning algorithm for image classification with the MNIST dataset.","md",{},false,"/article/ml-knn-image-classification",{"title":5,"description":1158},"article/ml-knn-image-classification",[1166,57,1167,1168,1169,136],"learning","machine-learning","KNN","k-nearest-neighbour","/images/article/StockCake-Digital Light Tunnel_1752061655.jpg","TPvr4aGlsfll8NuBEI4Sn6jdsXzHpwzSKmVClgNZuhg",[1173,1177],{"title":1174,"path":1175,"stem":1176,"children":-1},"Locally Signed Certificates With Mkcert","/article/locally-signed-certificates-with-mkcert","article/locally-signed-certificates-with-mkcert",null,1752065530256]