Tugas 4 | Implementasi K-Means Clustering#

K-Means Clustering#

K-Means Clustering adalah salah satu algoritma dalam menentukan klasifikasi terhadap objek berdasarkan attribut / fitur dari objek tersebut kedalam K kluster/partisi. K adalah angka positif yang menyatakan jumlah grup/kluster partisi terhadap objek. Pemartisian data dilakukan dengan mencari nilai jarak minimum antara data dan nilai centroid yang telah di set baik secara random atau pun dengan Initial Set of Centroids, kita juga dapat menentukan nilai centroid berdasarkan K object yang berurutan

Centroid adalah nilai rata-rata aritmetik dari sebuah bentuk objek dari seluruh titik dalam objek tersebut. Penerapan K-Means Clustering ini dapat dilakukan dengan prosedur step by step berikut :

  • Siapkan data training berbentuk vector.

  • Set nilai K cluster.

  • Set nilai awal centroids.

  • Hitung jarak antara data dan centroid menggunakan rumus Euclidean Distance.

    Rumus Menghitung Jarak :

    \(𝙙(p,q) = \sqrt {Σ_{i=1}^{n}(q_i - p_i)^2} \)

    ket :
      p,q	    =	dua titik di ruang-n Euclidean
      qi,pi    =	vektor Euclidean, dimulai dari asal ruang (titik awal)
      n        =	ruang-n
    
    
  • Partisi data berdasarkan nilai minimum.

  • Kemudian lakukan iterasi selama partisi data masih bergerak (tidak ada lagi objek yang bergerak ke partisi lain), bila masih maka ke poin 3.

  • Bila grup data sekarang sama dengan grup data sebelumnya, maka hentikan iterasi.

  • Data telah dipartisi sesuai nilai centroid akhir.

Implementasi ke Bahasa Pemrograman Python#

Persiapan data#

Data yang akan digunakan adalah data iris, yang dapat diperoleh disini.

import numpy as np  
import matplotlib.pyplot as plt  
import pandas as pd

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
#the imported dataset does not have the required column names so lets add it
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
irisdata = pd.read_csv(url, names=colnames)
irisdata
sepal-length sepal-width petal-length petal-width Class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 5 columns

Mengubah Label menjadi angka#

irisdata['Class'] = pd.Categorical(irisdata["Class"])
irisdata["Class"] = irisdata["Class"].cat.codes
irisdata
sepal-length sepal-width petal-length petal-width Class
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2
146 6.3 2.5 5.0 1.9 2
147 6.5 3.0 5.2 2.0 2
148 6.2 3.4 5.4 2.3 2
149 5.9 3.0 5.1 1.8 2

150 rows × 5 columns

Menentukan Attribute yang akan dipakai#

Pada kasus ini menggunakan attribute dari 0 sampai 4

x = irisdata.values[:, 0:4]
y = irisdata.values[:, 4]
# delete 'variety' column
df_without_label = irisdata.drop(columns=["Class"])
df_without_label
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
... ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

150 rows × 4 columns

Menentukan Cluster dan Menghitung jarak#

Pada Kasus ini, cluster yang ditentukan adalah 3

from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.metrics import accuracy_score

# Number of clusters
kmeans = KMeans(n_clusters=3)
# Fitting the input data
kmeans = kmeans.fit(x)
# Getting the cluster labels
labels = kmeans.predict(x)
# Centroid values
centroids = kmeans.cluster_centers_
hasil = kmeans.fit_transform(x)
hasil
array([[3.41925061, 0.14694217, 5.0595416 ],
       [3.39857426, 0.43816892, 5.11494335],
       [3.56935666, 0.41230086, 5.27935534],
       [3.42240962, 0.51883716, 5.15358977],
       [3.46726403, 0.19796969, 5.10433388],
       [3.14673162, 0.68380699, 4.68148797],
       [3.51650264, 0.41520116, 5.21147652],
       [3.33654987, 0.0599333 , 5.00252706],
       [3.57233779, 0.80099438, 5.32798107],
       [3.3583767 , 0.36659514, 5.06790865],
       [3.32449131, 0.48784424, 4.89806763],
       [3.31126872, 0.25138019, 4.9966845 ],
       [3.46661272, 0.49192682, 5.19103612],
       [3.90578362, 0.90906105, 5.65173594],
       [3.646649  , 1.02019214, 5.10804455],
       [3.49427881, 1.21309192, 4.88564095],
       [3.495248  , 0.66241377, 5.03090587],
       [3.38444981, 0.1509702 , 5.02342022],
       [3.11245944, 0.82848778, 4.61792995],
       [3.37738931, 0.39898872, 4.97213426],
       [3.07471224, 0.46172719, 4.6955761 ],
       [3.31506588, 0.33762701, 4.9236821 ],
       [3.93167253, 0.64435394, 5.59713396],
       [3.01233762, 0.37946278, 4.68193765],
       [3.06241269, 0.4845534 , 4.75095704],
       [3.19414543, 0.44180539, 4.90772894],
       [3.17967089, 0.20782685, 4.84545508],
       [3.30941724, 0.21815591, 4.93969029],
       [3.37648183, 0.2097427 , 5.01833618],
       [3.31272968, 0.40198507, 5.02954567],
       [3.26550651, 0.40495926, 4.98608729],
       [3.18083736, 0.42566654, 4.79550372],
       [3.53142353, 0.72442529, 5.06520776],
       [3.57102821, 0.9282198 , 5.04438334],
       [3.3583767 , 0.36659514, 5.06790865],
       [3.56904033, 0.34524194, 5.25071556],
       [3.43783276, 0.5287646 , 5.02368214],
       [3.3583767 , 0.36659514, 5.06790865],
       [3.66205264, 0.75550778, 5.40750095],
       [3.31092773, 0.11131936, 4.9664149 ],
       [3.49764675, 0.19181241, 5.14520862],
       [3.60850034, 1.23935144, 5.38423754],
       [3.68120561, 0.66602703, 5.40847417],
       [3.14278239, 0.38986151, 4.78803478],
       [3.00585191, 0.60761172, 4.59828494],
       [3.39468045, 0.47370033, 5.11844067],
       [3.32788568, 0.41855943, 4.92421655],
       [3.51879523, 0.4673243 , 5.23766854],
       [3.34104251, 0.41132955, 4.92859681],
       [3.40601705, 0.14139307, 5.08216833],
       [1.22697525, 3.97889331, 1.25489071],
       [0.684141  , 3.57569462, 1.44477759],
       [1.17527644, 4.13182671, 1.01903626],
       [0.73153652, 3.00672446, 2.45978458],
       [0.63853451, 3.7451291 , 1.3520017 ],
       [0.26937898, 3.34604124, 1.88009327],
       [0.76452634, 3.74149596, 1.28902785],
       [1.58388575, 2.233829  , 3.37155487],
       [0.75582717, 3.70928457, 1.41123804],
       [0.85984838, 2.79706847, 2.58955659],
       [1.53611907, 2.5937602 , 3.27864111],
       [0.32426175, 3.16815277, 1.90055758],
       [0.80841374, 3.07805003, 2.38073698],
       [0.39674141, 3.64323922, 1.45909603],
       [0.87269542, 2.50973943, 2.60303733],
       [0.87306498, 3.59544045, 1.50822767],
       [0.41229163, 3.36487622, 1.85387593],
       [0.53579956, 2.9438057 , 2.25517257],
       [0.6367639 , 3.70189033, 1.74778451],
       [0.71254917, 2.80399572, 2.49557781],
       [0.7093731 , 3.79431048, 1.37094403],
       [0.46349013, 3.02079327, 2.06563694],
       [0.69373966, 3.98757972, 1.29106776],
       [0.43661144, 3.60060995, 1.57547425],
       [0.54593856, 3.37188256, 1.70495043],
       [0.74313017, 3.55977415, 1.52298639],
       [0.98798453, 4.00819061, 1.18965415],
       [1.06739835, 4.20328348, 0.84636259],
       [0.21993519, 3.47148268, 1.61913335],
       [1.0243726 , 2.42231129, 2.77868071],
       [0.86396528, 2.73312861, 2.6440625 ],
       [0.97566381, 2.61755458, 2.75566654],
       [0.55763082, 2.82736485, 2.32254696],
       [0.73395781, 4.06974102, 1.22324554],
       [0.57500396, 3.33538484, 1.9942056 ],
       [0.68790275, 3.47050313, 1.61049622],
       [0.92700552, 3.87556344, 1.19803047],
       [0.61459444, 3.55803204, 1.81572464],
       [0.50830256, 2.93107352, 2.20430516],
       [0.6291191 , 2.9382294 , 2.40438484],
       [0.48790256, 3.23221163, 2.14635877],
       [0.38266958, 3.54152397, 1.52402278],
       [0.49185351, 2.94020271, 2.26286106],
       [1.5485635 , 2.27868208, 3.33648305],
       [0.3856087 , 3.07720523, 2.16211718],
       [0.44284695, 3.00931753, 2.11299567],
       [0.3449879 , 3.05790647, 2.07973003],
       [0.37241653, 3.29423618, 1.76829182],
       [1.66064034, 1.98584793, 3.44291999],
       [0.38393196, 2.98784069, 2.16527941],
       [2.0445799 , 5.23002792, 0.77731871],
       [0.85382472, 4.13627755, 1.29757391],
       [2.05245342, 5.2614059 , 0.30610139],
       [1.33089245, 4.63361544, 0.65293923],
       [1.72813078, 5.00335807, 0.38458885],
       [2.87401886, 6.06026336, 1.14225684],
       [1.07101875, 3.49158875, 2.4108337 ],
       [2.39730707, 5.59810611, 0.78573677],
       [1.67668563, 4.99343489, 0.65454939],
       [2.54158648, 5.60613878, 0.8435596 ],
       [1.17541367, 4.31086905, 0.74552218],
       [1.13563278, 4.46273369, 0.75289837],
       [1.59322675, 4.80907392, 0.25958095],
       [0.88917352, 4.11232197, 1.48572618],
       [1.20227628, 4.34524936, 1.30303821],
       [1.42273608, 4.57523682, 0.68288333],
       [1.33403966, 4.5953446 , 0.50991553],
       [3.20105585, 6.21652572, 1.47791217],
       [3.20759942, 6.4578628 , 1.52971038],
       [0.82617494, 4.0684631 , 1.53708992],
       [1.91251832, 5.07992047, 0.26952816],
       [0.81891975, 3.95277017, 1.5334904 ],
       [2.9794431 , 6.17566126, 1.31149299],
       [0.74269596, 4.05181342, 1.10668455],
       [1.75847731, 4.92666134, 0.27627819],
       [2.14580999, 5.27802918, 0.52766931],
       [0.62526165, 3.91887637, 1.20765678],
       [0.70228926, 3.94953061, 1.16212743],
       [1.4663925 , 4.78292714, 0.54629196],
       [1.93773659, 5.0624097 , 0.59428255],
       [2.31885342, 5.50890116, 0.7312665 ],
       [3.07340053, 5.99739877, 1.43802246],
       [1.51444141, 4.82261257, 0.5605572 ],
       [0.81536685, 4.10541009, 1.05631592],
       [1.23209127, 4.50652771, 1.12133058],
       [2.6381171 , 5.75777665, 0.95311851],
       [1.72401927, 4.84041238, 0.73306362],
       [1.31541133, 4.55574275, 0.57903109],
       [0.61011676, 3.83572575, 1.29960041],
       [1.60532899, 4.75659458, 0.34794609],
       [1.77481954, 4.97248348, 0.3893492 ],
       [1.53937059, 4.59738969, 0.68403844],
       [0.85382472, 4.13627755, 1.29757391],
       [2.00764279, 5.21259935, 0.30952112],
       [1.94554509, 5.09085376, 0.50939919],
       [1.44957743, 4.60751473, 0.61173881],
       [0.89747884, 4.21459274, 1.10072376],
       [1.17993324, 4.40998776, 0.65334214],
       [1.50889317, 4.59839015, 0.83572418],
       [0.83452741, 4.07622276, 1.1805499 ]])

Mengklasifikasikan Hasil#

labels
if(labels[1] == 0):
    print(labels)
elif labels[57] ==2:
  # print(labels)
  mapping = {0:2, 1:0, 2:1}
  a = [mapping[i] for i in labels]
  print(a,end='')
elif labels[1]==1:
  mapping = {0:1, 1:0, 2:2}
  a = [mapping[i] for i in labels]
  print(a,end='')

Menghitung Akurasi#

Rumus Menghitung akurasi:

\(Akurasi = {{TP + TN}\over Total Data} * 100 \% \)

  Ket:
  TP  = Jumlah data yang terklasifikasi True Positive
  TN  = Jumlah data yang terklasifikasi True Negative
  Jumlah Data = Jumlah data keseluruhan

Akurasi yang diperoleh dari hasil K-mean Clustering dengan data uji acak pada kasus ini sebesar:

# rumus akurasu
if(labels[1] == 0):
    accuracy = accuracy_score(y, labels)
elif labels[55]==2:
    accuracy = accuracy_score(y, a)
elif labels[1]==1:
    accuracy = accuracy_score(y, a)

accuracy
NameErrorTraceback (most recent call last)
<ipython-input-6-887ee7576197> in <module>
      7     accuracy = accuracy_score(y, a)
      8 
----> 9 accuracy

NameError: name 'accuracy' is not defined