Tugas 6 | Studi Kasus Diabetic Retinopathy#

Implementasi dengan menggunakan Naive Bayes dan K-NN

Naive Bayes adalah metode yang cocok untuk klasifikasi biner dan multiclass. Metode yang juga dikenal sebagai Naive Bayes Classifier ini menerapkan teknik supervised klasifikasi objek di masa depan dengan menetapkan label kelas ke instance/catatan menggunakan probabilitas bersyarat. Probabilitas bersyarat adalah ukuran peluang suatu peristiwa yang terjadi berdasarkan peristiwa lain yang telah (dengan asumsi, praduga, pernyataan, atau terbukti) terjadi.

Membaca dan Mengelolah data#

from scipy.io import arff
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import joblib

data = arff.loadarff('/content/drive/MyDrive/datamining/tugas/messidor_features.arff')
df = pd.DataFrame(data[0])
df.head()
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Class
0 1.0 1.0 22.0 22.0 22.0 19.0 18.0 14.0 49.895756 17.775994 5.270920 0.771761 0.018632 0.006864 0.003923 0.003923 0.486903 0.100025 1.0 b'0'
1 1.0 1.0 24.0 24.0 22.0 18.0 16.0 13.0 57.709936 23.799994 3.325423 0.234185 0.003903 0.003903 0.003903 0.003903 0.520908 0.144414 0.0 b'0'
2 1.0 1.0 62.0 60.0 59.0 54.0 47.0 33.0 55.831441 27.993933 12.687485 4.852282 1.393889 0.373252 0.041817 0.007744 0.530904 0.128548 0.0 b'1'
3 1.0 1.0 55.0 53.0 53.0 50.0 43.0 31.0 40.467228 18.445954 9.118901 3.079428 0.840261 0.272434 0.007653 0.001531 0.483284 0.114790 0.0 b'0'
4 1.0 1.0 44.0 44.0 44.0 41.0 39.0 27.0 18.026254 8.570709 0.410381 0.000000 0.000000 0.000000 0.000000 0.000000 0.475935 0.123572 0.0 b'1'

Label encoder#

Mengubah Label dari string ke angka, misal b'0' ke 0

Sebelum berubah

y = df['Class'].values
y[0:5]
array([b'0', b'0', b'1', b'0', b'1'], dtype=object)

Sesudah berubah

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(y)
y = le.transform(y)
y
array([0, 0, 1, ..., 0, 1, 0])

Mengubah colom#

Dari Angka ke Pembilang

col_names = []
for i in range(20):
    if i == 0:
        col_names.append('quality')
    if i == 1:
        col_names.append('prescreen')
    if i >= 2 and i <= 7:
        col_names.append('ma' + str(i))
    if i >= 8 and i <= 15:
        col_names.append('exudate' + str(i))
    if i == 16:
        col_names.append('euDist')
    if i == 17:
        col_names.append('diameter')
    if i == 18:
        col_names.append('amfm_class')
    if i == 19:
        col_names.append('label')
df.columns = [col_names]
df['label'] = y
df
quality prescreen ma2 ma3 ma4 ma5 ma6 ma7 exudate8 exudate9 exudate10 exudate11 exudate12 exudate13 exudate14 exudate15 euDist diameter amfm_class label
0 1.0 1.0 22.0 22.0 22.0 19.0 18.0 14.0 49.895756 17.775994 5.270920 0.771761 0.018632 0.006864 0.003923 0.003923 0.486903 0.100025 1.0 0
1 1.0 1.0 24.0 24.0 22.0 18.0 16.0 13.0 57.709936 23.799994 3.325423 0.234185 0.003903 0.003903 0.003903 0.003903 0.520908 0.144414 0.0 0
2 1.0 1.0 62.0 60.0 59.0 54.0 47.0 33.0 55.831441 27.993933 12.687485 4.852282 1.393889 0.373252 0.041817 0.007744 0.530904 0.128548 0.0 1
3 1.0 1.0 55.0 53.0 53.0 50.0 43.0 31.0 40.467228 18.445954 9.118901 3.079428 0.840261 0.272434 0.007653 0.001531 0.483284 0.114790 0.0 0
4 1.0 1.0 44.0 44.0 44.0 41.0 39.0 27.0 18.026254 8.570709 0.410381 0.000000 0.000000 0.000000 0.000000 0.000000 0.475935 0.123572 0.0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1146 1.0 1.0 34.0 34.0 34.0 33.0 31.0 24.0 6.071765 0.937472 0.031145 0.003115 0.000000 0.000000 0.000000 0.000000 0.537470 0.116795 0.0 0
1147 1.0 1.0 49.0 49.0 49.0 49.0 45.0 37.0 63.197145 27.377668 8.067688 0.979548 0.001552 0.000000 0.000000 0.000000 0.516733 0.124190 0.0 0
1148 1.0 0.0 49.0 48.0 48.0 45.0 43.0 33.0 30.461898 13.966980 1.763305 0.137858 0.011221 0.000000 0.000000 0.000000 0.560632 0.129843 0.0 0
1149 1.0 1.0 39.0 36.0 29.0 23.0 13.0 7.0 40.525739 12.604947 4.740919 1.077570 0.563518 0.326860 0.239568 0.174584 0.485972 0.106690 1.0 1
1150 1.0 1.0 7.0 7.0 7.0 7.0 7.0 5.0 69.423565 7.031843 1.750548 0.046597 0.021180 0.008472 0.000000 0.000000 0.556192 0.088957 0.0 0

1151 rows × 20 columns

X = df.drop(columns=['label'])
X
/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)
quality prescreen ma2 ma3 ma4 ma5 ma6 ma7 exudate8 exudate9 exudate10 exudate11 exudate12 exudate13 exudate14 exudate15 euDist diameter amfm_class
0 1.0 1.0 22.0 22.0 22.0 19.0 18.0 14.0 49.895756 17.775994 5.270920 0.771761 0.018632 0.006864 0.003923 0.003923 0.486903 0.100025 1.0
1 1.0 1.0 24.0 24.0 22.0 18.0 16.0 13.0 57.709936 23.799994 3.325423 0.234185 0.003903 0.003903 0.003903 0.003903 0.520908 0.144414 0.0
2 1.0 1.0 62.0 60.0 59.0 54.0 47.0 33.0 55.831441 27.993933 12.687485 4.852282 1.393889 0.373252 0.041817 0.007744 0.530904 0.128548 0.0
3 1.0 1.0 55.0 53.0 53.0 50.0 43.0 31.0 40.467228 18.445954 9.118901 3.079428 0.840261 0.272434 0.007653 0.001531 0.483284 0.114790 0.0
4 1.0 1.0 44.0 44.0 44.0 41.0 39.0 27.0 18.026254 8.570709 0.410381 0.000000 0.000000 0.000000 0.000000 0.000000 0.475935 0.123572 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1146 1.0 1.0 34.0 34.0 34.0 33.0 31.0 24.0 6.071765 0.937472 0.031145 0.003115 0.000000 0.000000 0.000000 0.000000 0.537470 0.116795 0.0
1147 1.0 1.0 49.0 49.0 49.0 49.0 45.0 37.0 63.197145 27.377668 8.067688 0.979548 0.001552 0.000000 0.000000 0.000000 0.516733 0.124190 0.0
1148 1.0 0.0 49.0 48.0 48.0 45.0 43.0 33.0 30.461898 13.966980 1.763305 0.137858 0.011221 0.000000 0.000000 0.000000 0.560632 0.129843 0.0
1149 1.0 1.0 39.0 36.0 29.0 23.0 13.0 7.0 40.525739 12.604947 4.740919 1.077570 0.563518 0.326860 0.239568 0.174584 0.485972 0.106690 1.0
1150 1.0 1.0 7.0 7.0 7.0 7.0 7.0 5.0 69.423565 7.031843 1.750548 0.046597 0.021180 0.008472 0.000000 0.000000 0.556192 0.088957 0.0

1151 rows × 19 columns

Normalisasi Pada data#

Proses Normalisasi data menggunakan normalisasi Min - Max

Perhitungan normalisasi menggunakan Rumus Probabilitas :

\(X_{new} = \frac{X - X_{min}}{X{max} - X{min}}\)

  ket:
  X  = Data
  Xmin = Nilai terkecil dari satu kolom baris data
  Xmax = Nilai terbesar dari satu kolom baris data
scaler = MinMaxScaler()
scaled = scaler.fit_transform(X)
features_names = X.columns.copy()
scaled_features = pd.DataFrame(scaled, columns=features_names)
scaled_features.head(10)

# SAVE FILE HASIL SCALER
# scaler_filename = "scaled.save"
# joblib.dump(scaler, scaler_filename) 
# scaler = joblib.load(scaler_filename)
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
quality prescreen ma2 ma3 ma4 ma5 ma6 ma7 exudate8 exudate9 exudate10 exudate11 exudate12 exudate13 exudate14 exudate15 euDist diameter amfm_class
0 1.0 1.0 0.140000 0.160305 0.176471 0.173077 0.177083 0.147727 0.122764 0.106359 0.049693 0.012913 0.000362 0.000342 0.000661 0.001271 0.530801 0.261133 1.0
1 1.0 1.0 0.153333 0.175573 0.176471 0.163462 0.156250 0.136364 0.142126 0.142403 0.031351 0.003918 0.000076 0.000194 0.000657 0.001264 0.682302 0.536341 0.0
2 1.0 1.0 0.406667 0.450382 0.487395 0.509615 0.479167 0.363636 0.137472 0.167497 0.119614 0.081188 0.027106 0.018571 0.007043 0.002509 0.726836 0.437973 0.0
3 1.0 1.0 0.360000 0.396947 0.436975 0.471154 0.437500 0.340909 0.099403 0.110368 0.085971 0.051525 0.016340 0.013555 0.001289 0.000496 0.514678 0.352675 0.0
4 1.0 1.0 0.286667 0.328244 0.361345 0.384615 0.395833 0.295455 0.043799 0.051281 0.003869 0.000000 0.000000 0.000000 0.000000 0.000000 0.481936 0.407122 0.0
5 1.0 1.0 0.286667 0.320611 0.336134 0.384615 0.375000 0.318182 0.069395 0.041498 0.021738 0.005417 0.000000 0.000000 0.000000 0.000000 0.601764 0.426770 0.0
6 1.0 0.0 0.186667 0.213740 0.235294 0.250000 0.250000 0.170455 0.037412 0.054531 0.015400 0.000000 0.000000 0.000000 0.000000 0.000000 0.775126 0.506339 0.0
7 1.0 1.0 0.033333 0.038168 0.042017 0.048077 0.010417 0.000000 0.050374 0.056828 0.011536 0.002516 0.000000 0.000000 0.000000 0.000000 0.929166 0.081622 1.0
8 1.0 1.0 0.140000 0.152672 0.142857 0.134615 0.125000 0.102273 0.164381 0.140880 0.057991 0.008305 0.000000 0.000000 0.000000 0.000000 0.589477 0.365093 0.0
9 1.0 1.0 0.520000 0.564885 0.605042 0.673077 0.656250 0.522727 0.053997 0.060159 0.008246 0.001670 0.000455 0.000000 0.000000 0.000000 0.860738 0.317608 0.0

Data Baru#

Lalu label yang sudah dirubah selanjutnya dimasukkan pada data, sehingga tercipta dataframe yang baru

scaled_features['label'] = y
scaled_features
quality prescreen ma2 ma3 ma4 ma5 ma6 ma7 exudate8 exudate9 exudate10 exudate11 exudate12 exudate13 exudate14 exudate15 euDist diameter amfm_class label
0 1.0 1.0 0.140000 0.160305 0.176471 0.173077 0.177083 0.147727 0.122764 0.106359 0.049693 0.012913 0.000362 0.000342 0.000661 0.001271 0.530801 0.261133 1.0 0
1 1.0 1.0 0.153333 0.175573 0.176471 0.163462 0.156250 0.136364 0.142126 0.142403 0.031351 0.003918 0.000076 0.000194 0.000657 0.001264 0.682302 0.536341 0.0 0
2 1.0 1.0 0.406667 0.450382 0.487395 0.509615 0.479167 0.363636 0.137472 0.167497 0.119614 0.081188 0.027106 0.018571 0.007043 0.002509 0.726836 0.437973 0.0 1
3 1.0 1.0 0.360000 0.396947 0.436975 0.471154 0.437500 0.340909 0.099403 0.110368 0.085971 0.051525 0.016340 0.013555 0.001289 0.000496 0.514678 0.352675 0.0 0
4 1.0 1.0 0.286667 0.328244 0.361345 0.384615 0.395833 0.295455 0.043799 0.051281 0.003869 0.000000 0.000000 0.000000 0.000000 0.000000 0.481936 0.407122 0.0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1146 1.0 1.0 0.220000 0.251908 0.277311 0.307692 0.312500 0.261364 0.014179 0.005609 0.000294 0.000052 0.000000 0.000000 0.000000 0.000000 0.756089 0.365106 0.0 0
1147 1.0 1.0 0.320000 0.366412 0.403361 0.461538 0.458333 0.409091 0.155722 0.163809 0.076060 0.016390 0.000030 0.000000 0.000000 0.000000 0.663701 0.410954 0.0 0
1148 1.0 0.0 0.320000 0.358779 0.394958 0.423077 0.437500 0.363636 0.074612 0.083569 0.016624 0.002307 0.000218 0.000000 0.000000 0.000000 0.859281 0.446002 0.0 0
1149 1.0 1.0 0.253333 0.267176 0.235294 0.211538 0.125000 0.068182 0.099548 0.075419 0.044696 0.018030 0.010958 0.016263 0.040346 0.056559 0.526653 0.302456 1.0 1
1150 1.0 1.0 0.040000 0.045802 0.050420 0.057692 0.062500 0.045455 0.171150 0.042074 0.016504 0.000780 0.000412 0.000422 0.000000 0.000000 0.839500 0.192513 0.0 0

1151 rows × 20 columns

GaussianNB#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG SUDAH DI NORMALISASI#

Persiapan Libary#

Pada tahap dibawah dilakukan split data train dengan nilai 0.2 , serta dilakukan prediksi dan menghitung probalitas

from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix


X_train, X_test = train_test_split(scaled, train_size = 0.8, test_size = 0.2, shuffle=False)
y_train, y_test = train_test_split(y, train_size = 0.8, test_size = 0.2, shuffle=False)

clf2 = GaussianNB()
clf2.fit(X_train, y_train)
y_pred = clf2.predict(X_test)
probas = clf2.predict_proba(X_test)[:,1]
y_test.shape + probas.shape
(231, 231)

Hasil Perhitungan Probabilitas#

Perhitungan diatas menggunakan Rumus Probabilitas :

P(A) = \(\frac{n(A)}{n(S)}\)

  ket:
  P(A)  = Peluang suatu kejadian "A"
  n(A) = Banyak kejadian yang diinginkan
  n(S) = Jumlah total kejadian sample
probas
array([0.75620474, 0.11189161, 1.        , 0.10486252, 0.14252695,
       1.        , 0.87186459, 0.08938164, 0.06195771, 0.07016774,
       1.        , 0.07309769, 0.99377827, 0.08398282, 0.99999961,
       0.85347824, 0.91850879, 0.07858852, 0.64476537, 0.21435453,
       0.09334844, 0.9769378 , 0.95011339, 0.68211181, 0.96900188,
       0.99890913, 0.99724613, 0.16434853, 0.86018358, 0.99999363,
       0.53634466, 0.08895374, 0.98313892, 0.29158109, 0.25465845,
       1.        , 0.42762289, 0.88627287, 0.48478246, 0.68057536,
       0.09285228, 0.92485924, 0.12602622, 0.99996256, 0.67208144,
       0.95535994, 0.08212779, 0.92138625, 0.14110054, 0.29346465,
       0.99957587, 0.09783595, 0.07974339, 0.63233074, 0.2814182 ,
       1.        , 0.14248748, 1.        , 1.        , 0.99977481,
       0.94905966, 0.32335402, 0.66732001, 0.98470723, 0.99924866,
       0.91560627, 0.09860317, 0.99956381, 0.13165869, 0.93304628,
       1.        , 0.1097288 , 0.336975  , 0.67880805, 0.07551187,
       0.15046226, 0.06612315, 0.07050754, 0.06605462, 0.88089312,
       1.        , 0.99908009, 0.65508392, 0.85384983, 0.99997971,
       0.85982459, 1.        , 0.22228045, 1.        , 0.51881583,
       1.        , 0.1601833 , 0.56489793, 1.        , 0.95912747,
       0.99390307, 0.06669617, 0.99994093, 0.95697036, 0.66801819,
       0.86976186, 0.06706977, 0.07242654, 0.47046825, 0.84920321,
       1.        , 0.05787343, 0.99982102, 0.08012137, 0.99891305,
       0.99999133, 0.18732054, 0.83260846, 0.16077613, 0.799558  ,
       0.42904874, 0.98720071, 1.        , 0.96964046, 0.08441927,
       0.12885481, 0.94033065, 0.67612727, 0.19183544, 0.07696444,
       0.26696759, 0.17817935, 0.16196547, 1.        , 0.86166374,
       0.44565553, 0.99998881, 0.99228367, 1.        , 0.99985484,
       1.        , 0.99999999, 0.11737421, 0.07561375, 0.8821287 ,
       0.22906703, 0.55149864, 1.        , 0.39346773, 0.12066409,
       0.51639957, 1.        , 1.        , 0.29719598, 0.15344951,
       0.0863146 , 0.29259063, 0.41884135, 0.05694947, 0.08461161,
       1.        , 0.7729152 , 1.        , 0.97079294, 0.54664137,
       1.        , 0.11629832, 0.08089507, 0.07805169, 0.99948837,
       0.18394124, 0.99996964, 0.58947713, 0.08030054, 0.43904287,
       0.66311756, 0.07841724, 0.98371956, 0.55877777, 0.96593647,
       0.55264935, 0.9813472 , 0.9862656 , 0.99975574, 0.36828985,
       0.7337582 , 0.05763816, 0.11919843, 0.48677852, 0.15423286,
       0.56479739, 0.9246502 , 0.07800551, 0.88966201, 0.35381276,
       0.92915329, 0.72146675, 1.        , 0.99847688, 1.        ,
       1.        , 0.94776044, 0.16369382, 0.06581339, 0.95954445,
       0.2764462 , 0.9983774 , 0.49497013, 0.05286252, 0.14214012,
       0.07261336, 0.08716287, 0.9895085 , 0.12310251, 0.78514343,
       0.22433016, 0.65895065, 0.99994182, 0.89992995, 1.        ,
       0.47476407, 0.99978951, 0.21982009, 0.07868117, 0.0742566 ,
       0.18242771, 0.99999934, 0.79139724, 0.61353135, 0.36884766,
       0.99927186, 0.33597509, 0.78636633, 0.99509548, 1.        ,
       0.07830934])

Pembulatan#

Pembulatan menjadi 0 dan 1 dari hasil probabilitas diatas

probas = probas.round()
probas
array([1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1.,
       0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0.,
       0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1.,
       0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1.,
       0., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.,
       1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0.,
       0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1., 1., 1.,
       0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1.,
       1., 0., 0., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.,
       0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
       1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 1., 1.,
       0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0.,
       1., 1., 1., 0., 1., 0., 1., 1., 1., 0.])
y_test
array([0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0])

Menghitung hasil Akhir#

Menghitung precision_score , accuracy_score , recall_score , f1_score

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

cm = confusion_matrix(y_test,y_pred)
precision = round(precision_score(y_test,y_pred, average="macro")*100,2)
acc_nb = round(accuracy_score(y_test,y_pred)*100,2)
recall = round(recall_score(y_test,y_pred, average="macro")*100,2)
f1score = round(f1_score(y_test, y_pred, average="macro")*100,2)
print('Konfusi Matrix\n',cm)
print('precision: {}'.format(precision))
print('recall: {}'.format(recall))
print('fscore: {}'.format(f1score))
print('accuracy: {}'.format(acc_nb))
Konfusi Matrix
 [[65 50]
 [34 82]]
precision: 63.89
recall: 63.61
fscore: 63.44
accuracy: 63.64

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG BELUM DI NORMALISASI#

Persiapan Libary#

Menghitung Probabilitas

X_train_2, X_test_2 = train_test_split(X, train_size = 0.8, test_size = 0.2, shuffle=False)
y_train_2, y_test_2 = train_test_split(y, train_size = 0.8, test_size = 0.2, shuffle=False)

nb = GaussianNB()
nb.fit(X_train_2, y_train_2)
y_pred_2 = nb.predict(X_test_2)
probas_2 = nb.predict_proba(X_test_2)[:,1]
y_test_2.shape + probas.shape
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
(231, 231)

Hasil dari perhitungan Probabilitas#

probas_2
array([2.44675761e-02, 1.01773885e-03, 1.00000000e+00, 9.47815666e-04,
       1.34612803e-03, 1.00000000e+00, 5.22355851e-02, 7.94813794e-04,
       5.35188240e-04, 6.11399680e-04, 9.99999432e-01, 6.38903416e-04,
       5.64178101e-01, 7.40947996e-04, 9.99951663e-01, 4.50590733e-02,
       8.34303868e-02, 6.90239788e-04, 1.44660822e-02, 2.17747148e-03,
       8.32397496e-04, 2.55027023e-01, 1.33668119e-01, 1.70935484e-02,
       2.02105520e-01, 8.81208832e-01, 7.45754095e-01, 1.58269133e-03,
       4.74006385e-02, 9.99098165e-01, 9.29907570e-03, 7.90665894e-04,
       3.20652663e-01, 3.32866709e-03, 2.76083452e-03, 1.00000000e+00,
       5.96968366e-03, 5.94110417e-02, 7.56560778e-03, 1.69729481e-02,
       8.27793664e-04, 9.03347700e-02, 1.16745907e-03, 9.95375504e-01,
       1.63561902e-02, 1.47791344e-01, 7.24905010e-04, 8.67967501e-02,
       1.32947541e-03, 3.35089635e-03, 9.50193633e-01, 8.78572937e-04,
       7.01652166e-04, 1.37358778e-02, 3.14435173e-03, 9.99999999e-01,
       1.34427434e-03, 1.00000000e+00, 1.00000000e+00, 9.72920955e-01,
       1.30918012e-01, 3.85025743e-03, 1.59931116e-02, 3.42788773e-01,
       9.15196947e-01, 8.06778812e-02, 8.85919633e-04, 9.48915433e-01,
       1.22725175e-03, 1.01586080e-01, 1.00000000e+00, 9.97014480e-04,
       4.10523312e-03, 1.68334678e-02, 6.61775129e-04, 1.43184342e-03,
       5.73620325e-04, 6.13857457e-04, 5.73105244e-04, 5.65418440e-02,
       1.00000000e+00, 8.97929611e-01, 1.51712402e-02, 4.52022695e-02,
       9.97489656e-01, 4.73673741e-02, 1.00000000e+00, 2.31495254e-03,
       1.00000000e+00, 8.64325067e-03, 1.00000000e+00, 1.54346853e-03,
       1.03998310e-02, 1.00000000e+00, 1.59856728e-01, 5.68862498e-01,
       5.78870215e-04, 9.92355622e-01, 1.52617570e-01, 1.60731299e-02,
       5.13056173e-02, 5.82336764e-04, 6.31896182e-04, 7.15995799e-03,
       4.35584870e-02, 9.99999999e-01, 4.98332077e-04, 9.78264469e-01,
       7.05055454e-04, 8.81391356e-01, 9.98913097e-01, 1.82598399e-03,
       3.86635697e-02, 1.54749301e-03, 3.13351187e-02, 6.04598181e-03,
       3.84555614e-01, 1.00000000e+00, 2.05544383e-01, 7.46174201e-04,
       1.20223520e-03, 1.12516941e-01, 1.66317431e-02, 1.91938764e-03,
       6.75484142e-04, 2.93869551e-03, 1.75468756e-03, 1.56243702e-03,
       1.00000000e+00, 4.80938887e-02, 6.46394318e-03, 9.98517406e-01,
       5.09533942e-01, 1.00000000e+00, 9.82421991e-01, 1.00000000e+00,
       9.99998419e-01, 1.07598699e-03, 6.62285702e-04, 5.71101358e-02,
       2.40186886e-03, 9.83334604e-03, 1.00000000e+00, 5.23496849e-03,
       1.10951131e-03, 8.57545944e-03, 1.00000000e+00, 1.00000000e+00,
       3.40220861e-03, 1.46764182e-03, 7.64649819e-04, 3.34099911e-03,
       5.79728248e-03, 4.89839530e-04, 7.47998334e-04, 1.00000000e+00,
       2.66942924e-02, 9.99999832e-01, 2.12168956e-01, 9.63262309e-03,
       1.00000000e+00, 1.06619487e-03, 7.12604227e-04, 6.84127339e-04,
       9.40579172e-01, 1.82079480e-03, 9.96262033e-01, 1.11494340e-02,
       7.07306176e-04, 6.29958669e-03, 1.56448325e-02, 6.87227330e-04,
       3.28325232e-01, 1.01569603e-02, 1.86529948e-01, 9.88952443e-03,
       2.98841332e-01, 3.67585329e-01, 9.70379034e-01, 4.71158116e-03,
       2.18164680e-02, 4.96201304e-04, 1.09137635e-03, 7.62767464e-03,
       1.46743413e-03, 1.03924942e-02, 9.03396799e-02, 6.83172305e-04,
       6.12074732e-02, 4.42492741e-03, 9.60173292e-02, 2.04200456e-02,
       1.00000000e+00, 8.41506342e-01, 1.00000000e+00, 1.00000000e+00,
       1.28134600e-01, 1.58320241e-03, 5.71202886e-04, 1.61299147e-01,
       3.07943522e-03, 8.32857193e-01, 7.87722475e-03, 4.52806026e-04,
       1.34049551e-03, 6.33896620e-04, 7.73407297e-04, 4.28987476e-01,
       1.13283472e-03, 2.84555651e-02, 2.33574217e-03, 1.54021134e-02,
       9.92867699e-01, 6.78937673e-02, 1.00000000e+00, 7.24986327e-03,
       9.72849314e-01, 2.27608368e-03, 6.91472414e-04, 6.49263216e-04,
       1.80605877e-03, 9.99906011e-01, 2.97679734e-02, 1.26767112e-02,
       4.70776922e-03, 9.17498607e-01, 4.08160984e-03, 2.89283876e-02,
       6.21468999e-01, 1.00000000e+00, 6.87419130e-04])
probas_2 = probas_2.round()
probas_2
array([0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1.,
       0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1.,
       0., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1.,
       1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0.,
       0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0.,
       1., 0., 0., 0., 1., 0., 0., 1., 1., 0.])
y_test_2
array([0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0])

Mencari Hasil Akhir#

cm_2 = confusion_matrix(y_test_2,probas_2)
precision_2 = round(precision_score(y_test_2,probas_2, average="macro")*100,2)
acc_nb_2 = round(accuracy_score(y_test_2,probas_2)*100,2)
recall_2 = round(recall_score(y_test_2,probas_2, average="macro")*100,2)
f1score_2 = round(f1_score(y_test_2, probas_2, average="macro")*100,2)
print('Konfusi Matrix\n',cm_2)
print('precision: {}'.format(precision_2))
print('recall: {}'.format(recall_2))
print('fscore: {}'.format(f1score_2))
print('accuracy: {}'.format(acc_nb_2))
Konfusi Matrix
 [[102  13]
 [ 69  47]]
precision: 68.99
recall: 64.61
fscore: 62.37
accuracy: 64.5

K - NN#

PROSES PADA K-NN MENGGUNAKAN DATA YANG BELUM DI NORMALISASI#

Cara Kerja Algoritma K-Nearest Neighbor#

Menentukan nilai K#

Nilai k pada algoritma KNN mendefinisikan berapa banyak tetangga yang akan diperiksa untuk menentukan klasifikasi titik kueri tertentu. Misalnya, jika k=1, instance akan ditugaskan ke kelas yang sama dengan tetangga terdekatnya.

Menghitung Jarak Data#

Hitung jarak dari jumlah tetangga K (bisa menggunakan salah satu metrik jarak, misalnya Euclidean distance)

Rumus :

\(d(x,y) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}\)

Mengurutkan jarak#

Setelah berhasil menghitung semua jarak pada data latih dengan data uji, maka urutkan semua jarak yang sudah di dapat dari yang terkecil sampai terbesar

Ambil Tetangga Terdekat K#

Ambil tetangga terdekat K sesuai jarak yang dihitung.

Tentukan Mayoritas#

Dari data tetangga yang sudah di ambil berdasarkan K, tentukan kategori mayoritas yang ada. Maka dari itu, data uji tersebut merupakan tetangga dari data mayoritas tersebut.

Split Data#

Pada kasus ini, membagi data menjadi set pelatihan dan pengujian (80:20) %

Rumus :

Jumlah data uji : \(\frac{persentase data uji}{100} jumlah data\)

X_train_knn,X_test_knn,y_train_knn,y_test_knn = train_test_split(X,y,test_size=0.2,random_state=4)
#shape of train and test objects
print(X_train_knn.shape)
print(X_test_knn.shape)
(920, 19)
(231, 19)

Mencari Akurasi dengan K = 1 sampai 25#

mencari akurasi yang lebih besar dengan data latih dan data uji yang sudah di split di atas. Namun sebelum menghitung akurasi, pada proses ini juga sudah menghitung jarak antar data sehingga dapat mengghitung akurasi.

Terdapat proses menghitung jarak

Rumus :

\(d(x,y) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}\)

#import the KNeighborsClassifier class from sklearn
from sklearn.neighbors import KNeighborsClassifier

#import metrics model to check the accuracy 
from sklearn import metrics
#Try running from k=1 through 25 and record testing accuracy
k_range = range(1,26)
scores = {}
scores_list = []
for k in k_range:
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X_train_knn,y_train_knn)
        y_pred_knn=knn.predict(X_test_knn)
        scores[k] = metrics.accuracy_score(y_test_knn,y_pred_knn)
        scores_list.append(metrics.accuracy_score(y_test_knn,y_pred_knn))
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
scores_list
[0.6536796536796536,
 0.6233766233766234,
 0.6666666666666666,
 0.6406926406926406,
 0.6277056277056277,
 0.6536796536796536,
 0.6493506493506493,
 0.6493506493506493,
 0.6493506493506493,
 0.658008658008658,
 0.670995670995671,
 0.6623376623376623,
 0.6536796536796536,
 0.658008658008658,
 0.6536796536796536,
 0.670995670995671,
 0.658008658008658,
 0.6493506493506493,
 0.6493506493506493,
 0.6363636363636364,
 0.6406926406926406,
 0.658008658008658,
 0.6493506493506493,
 0.6536796536796536,
 0.645021645021645]

Visualisasi Hasil Akurasi#

Menggunakan matplotlib

%matplotlib inline
import matplotlib.pyplot as plt

#plot the relationship between K and the testing accuracy
plt.plot(k_range,scores_list)
plt.xlabel('Value of K for KNN')
plt.ylabel('Testing Accuracy')
Text(0, 0.5, 'Testing Accuracy')
_images/notebooks7_71_1.png

Menghitung N satuan#

knn = KNeighborsClassifier(n_neighbors = 5)

Fitting the kNN Model

knn.fit(X_train_knn,y_train_knn)
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
KNeighborsClassifier()

Predict the test set Result

yy_pred = knn.predict(X_test_knn)
yy_pred
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
array([0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0,
       1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1])

the final result

cm_knn = confusion_matrix(y_test_knn,yy_pred)
precision_knn = round(precision_score(y_test_knn,yy_pred, average="macro")*100,2)
acc_nb_knn = round(accuracy_score(y_test_knn,yy_pred)*100,2)
recall_knn = round(recall_score(y_test_knn,yy_pred, average="macro")*100,2)
f1score_knn = round(f1_score(y_test_knn, yy_pred, average="macro")*100,2)
print('Konfusi Matrix\n',cm_knn)
print('precision: {}'.format(precision_knn))
print('recall: {}'.format(recall_knn))
print('fscore: {}'.format(f1score_knn))
print('accuracy: {}'.format(acc_nb_knn))
Konfusi Matrix
 [[78 31]
 [55 67]]
precision: 63.51
recall: 63.24
fscore: 62.69
accuracy: 62.77

PROSES PADA K-NN MENGGUNAKAN DATA YANG SUDAH DI NORMALISASI#

Split Data#

X_train_knn2,X_test_knn2,y_train_knn2,y_test_knn2 = train_test_split(scaled,y,test_size=0.2,random_state=4)
#shape of train and test objects
print(X_train_knn2.shape)
print(X_test_knn2.shape)
(920, 19)
(231, 19)

Mencari Akurasi dengan K = 1 sampai 25#

mencari akurasi yang lebih besar dengan data latih dan data uji yang sudah di split di atas. Namun sebelum menghitung akurasi, pada proses ini juga sudah menghitung jarak antar data sehingga dapat mengghitung akurasi.

#import the KNeighborsClassifier class from sklearn
from sklearn.neighbors import KNeighborsClassifier

#import metrics model to check the accuracy 
from sklearn import metrics
#Try running from k=1 through 25 and record testing accuracy
k_range = range(1,26)
scores = {}
scores_list = []
for k in k_range:
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X_train_knn2,y_train_knn2)
        y_pred_knn2=knn.predict(X_test_knn2)
        scores[k] = metrics.accuracy_score(y_test_knn2,y_pred_knn2)
        scores_list.append(metrics.accuracy_score(y_test_knn2,y_pred_knn2))
scores_list
[0.6017316017316018,
 0.5800865800865801,
 0.5930735930735931,
 0.6103896103896104,
 0.6277056277056277,
 0.6623376623376623,
 0.6666666666666666,
 0.6753246753246753,
 0.6277056277056277,
 0.6190476190476191,
 0.6233766233766234,
 0.6320346320346321,
 0.658008658008658,
 0.6277056277056277,
 0.658008658008658,
 0.6493506493506493,
 0.6666666666666666,
 0.670995670995671,
 0.683982683982684,
 0.6753246753246753,
 0.6883116883116883,
 0.6883116883116883,
 0.6623376623376623,
 0.6753246753246753,
 0.670995670995671]

Visualisasi Hasil Akurasi#

Menggunakan matplotlib

%matplotlib inline
import matplotlib.pyplot as plt

#plot the relationship between K and the testing accuracy
plt.plot(k_range,scores_list)
plt.xlabel('Value of K for KNN')
plt.ylabel('Testing Accuracy')
Text(0, 0.5, 'Testing Accuracy')
_images/notebooks7_89_1.png

Menghitung N satuan#

knn = KNeighborsClassifier(n_neighbors = 5)

Fitting the kNN Model

knn.fit(X_train_knn2,y_train_knn2)
KNeighborsClassifier()

Predict the test set Result

yy_pred2 = knn.predict(X_test_knn2)
yy_pred2
array([0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1])

the final result

cm_knn2 = confusion_matrix(y_test_knn2,yy_pred2)
precision_knn2 = round(precision_score(y_test_knn2,yy_pred2, average="macro")*100,2)
acc_nb_knn2 = round(accuracy_score(y_test_knn2,yy_pred2)*100,2)
recall_knn2 = round(recall_score(y_test_knn2,yy_pred2, average="macro")*100,2)
f1score_knn2 = round(f1_score(y_test_knn2, yy_pred2, average="macro")*100,2)
print('Konfusi Matrix\n',cm_knn2)
print('precision: {}'.format(precision_knn2))
print('recall: {}'.format(recall_knn2))
print('fscore: {}'.format(f1score_knn2))
print('accuracy: {}'.format(acc_nb_knn2))
Konfusi Matrix
 [[74 35]
 [51 71]]
precision: 63.09
recall: 63.04
fscore: 62.76
accuracy: 62.77