Tugas 6 | Studi Kasus Diabetic Retinopathy#

Implementasi dengan menggunakan Naive Bayes dan K-NN

Naive Bayes adalah metode yang cocok untuk klasifikasi biner dan multiclass. Metode yang juga dikenal sebagai Naive Bayes Classifier ini menerapkan teknik supervised klasifikasi objek di masa depan dengan menetapkan label kelas ke instance/catatan menggunakan probabilitas bersyarat. Probabilitas bersyarat adalah ukuran peluang suatu peristiwa yang terjadi berdasarkan peristiwa lain yang telah (dengan asumsi, praduga, pernyataan, atau terbukti) terjadi.

Membaca dan Mengelolah data#

from scipy.io import arff
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import joblib

data = arff.loadarff('/content/drive/MyDrive/datamining/tugas/messidor_features.arff')
df = pd.DataFrame(data[0])
df.head()

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	Class
0	1.0	1.0	22.0	22.0	22.0	19.0	18.0	14.0	49.895756	17.775994	5.270920	0.771761	0.018632	0.006864	0.003923	0.003923	0.486903	0.100025	1.0	b'0'
1	1.0	1.0	24.0	24.0	22.0	18.0	16.0	13.0	57.709936	23.799994	3.325423	0.234185	0.003903	0.003903	0.003903	0.003903	0.520908	0.144414	0.0	b'0'
2	1.0	1.0	62.0	60.0	59.0	54.0	47.0	33.0	55.831441	27.993933	12.687485	4.852282	1.393889	0.373252	0.041817	0.007744	0.530904	0.128548	0.0	b'1'
3	1.0	1.0	55.0	53.0	53.0	50.0	43.0	31.0	40.467228	18.445954	9.118901	3.079428	0.840261	0.272434	0.007653	0.001531	0.483284	0.114790	0.0	b'0'
4	1.0	1.0	44.0	44.0	44.0	41.0	39.0	27.0	18.026254	8.570709	0.410381	0.000000	0.000000	0.000000	0.000000	0.000000	0.475935	0.123572	0.0	b'1'

Label encoder#

Mengubah Label dari string ke angka, misal b'0' ke 0

Sebelum berubah

y = df['Class'].values
y[0:5]

array([b'0', b'0', b'1', b'0', b'1'], dtype=object)

Sesudah berubah

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(y)
y = le.transform(y)
y

array([0, 0, 1, ..., 0, 1, 0])

Mengubah colom#

Dari Angka ke Pembilang

col_names = []
for i in range(20):
    if i == 0:
        col_names.append('quality')
    if i == 1:
        col_names.append('prescreen')
    if i >= 2 and i <= 7:
        col_names.append('ma' + str(i))
    if i >= 8 and i <= 15:
        col_names.append('exudate' + str(i))
    if i == 16:
        col_names.append('euDist')
    if i == 17:
        col_names.append('diameter')
    if i == 18:
        col_names.append('amfm_class')
    if i == 19:
        col_names.append('label')

df.columns = [col_names]
df['label'] = y
df

	quality	prescreen	ma2	ma3	ma4	ma5	ma6	ma7	exudate8	exudate9	exudate10	exudate11	exudate12	exudate13	exudate14	exudate15	euDist	diameter	amfm_class	label
0	1.0	1.0	22.0	22.0	22.0	19.0	18.0	14.0	49.895756	17.775994	5.270920	0.771761	0.018632	0.006864	0.003923	0.003923	0.486903	0.100025	1.0	0
1	1.0	1.0	24.0	24.0	22.0	18.0	16.0	13.0	57.709936	23.799994	3.325423	0.234185	0.003903	0.003903	0.003903	0.003903	0.520908	0.144414	0.0	0
2	1.0	1.0	62.0	60.0	59.0	54.0	47.0	33.0	55.831441	27.993933	12.687485	4.852282	1.393889	0.373252	0.041817	0.007744	0.530904	0.128548	0.0	1
3	1.0	1.0	55.0	53.0	53.0	50.0	43.0	31.0	40.467228	18.445954	9.118901	3.079428	0.840261	0.272434	0.007653	0.001531	0.483284	0.114790	0.0	0
4	1.0	1.0	44.0	44.0	44.0	41.0	39.0	27.0	18.026254	8.570709	0.410381	0.000000	0.000000	0.000000	0.000000	0.000000	0.475935	0.123572	0.0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1146	1.0	1.0	34.0	34.0	34.0	33.0	31.0	24.0	6.071765	0.937472	0.031145	0.003115	0.000000	0.000000	0.000000	0.000000	0.537470	0.116795	0.0	0
1147	1.0	1.0	49.0	49.0	49.0	49.0	45.0	37.0	63.197145	27.377668	8.067688	0.979548	0.001552	0.000000	0.000000	0.000000	0.516733	0.124190	0.0	0
1148	1.0	0.0	49.0	48.0	48.0	45.0	43.0	33.0	30.461898	13.966980	1.763305	0.137858	0.011221	0.000000	0.000000	0.000000	0.560632	0.129843	0.0	0
1149	1.0	1.0	39.0	36.0	29.0	23.0	13.0	7.0	40.525739	12.604947	4.740919	1.077570	0.563518	0.326860	0.239568	0.174584	0.485972	0.106690	1.0	1
1150	1.0	1.0	7.0	7.0	7.0	7.0	7.0	5.0	69.423565	7.031843	1.750548	0.046597	0.021180	0.008472	0.000000	0.000000	0.556192	0.088957	0.0	0

1151 rows × 20 columns

X = df.drop(columns=['label'])
X

/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

	quality	prescreen	ma2	ma3	ma4	ma5	ma6	ma7	exudate8	exudate9	exudate10	exudate11	exudate12	exudate13	exudate14	exudate15	euDist	diameter	amfm_class
0	1.0	1.0	22.0	22.0	22.0	19.0	18.0	14.0	49.895756	17.775994	5.270920	0.771761	0.018632	0.006864	0.003923	0.003923	0.486903	0.100025	1.0
1	1.0	1.0	24.0	24.0	22.0	18.0	16.0	13.0	57.709936	23.799994	3.325423	0.234185	0.003903	0.003903	0.003903	0.003903	0.520908	0.144414	0.0
2	1.0	1.0	62.0	60.0	59.0	54.0	47.0	33.0	55.831441	27.993933	12.687485	4.852282	1.393889	0.373252	0.041817	0.007744	0.530904	0.128548	0.0
3	1.0	1.0	55.0	53.0	53.0	50.0	43.0	31.0	40.467228	18.445954	9.118901	3.079428	0.840261	0.272434	0.007653	0.001531	0.483284	0.114790	0.0
4	1.0	1.0	44.0	44.0	44.0	41.0	39.0	27.0	18.026254	8.570709	0.410381	0.000000	0.000000	0.000000	0.000000	0.000000	0.475935	0.123572	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1146	1.0	1.0	34.0	34.0	34.0	33.0	31.0	24.0	6.071765	0.937472	0.031145	0.003115	0.000000	0.000000	0.000000	0.000000	0.537470	0.116795	0.0
1147	1.0	1.0	49.0	49.0	49.0	49.0	45.0	37.0	63.197145	27.377668	8.067688	0.979548	0.001552	0.000000	0.000000	0.000000	0.516733	0.124190	0.0
1148	1.0	0.0	49.0	48.0	48.0	45.0	43.0	33.0	30.461898	13.966980	1.763305	0.137858	0.011221	0.000000	0.000000	0.000000	0.560632	0.129843	0.0
1149	1.0	1.0	39.0	36.0	29.0	23.0	13.0	7.0	40.525739	12.604947	4.740919	1.077570	0.563518	0.326860	0.239568	0.174584	0.485972	0.106690	1.0
1150	1.0	1.0	7.0	7.0	7.0	7.0	7.0	5.0	69.423565	7.031843	1.750548	0.046597	0.021180	0.008472	0.000000	0.000000	0.556192	0.088957	0.0

1151 rows × 19 columns

Normalisasi Pada data#

Proses Normalisasi data menggunakan normalisasi Min - Max

Perhitungan normalisasi menggunakan Rumus Probabilitas :

\(X_{new} = \frac{X - X_{min}}{X{max} - X{min}}\)

  ket:
  X  = Data
  Xmin = Nilai terkecil dari satu kolom baris data
  Xmax = Nilai terbesar dari satu kolom baris data

scaler = MinMaxScaler()
scaled = scaler.fit_transform(X)
features_names = X.columns.copy()
scaled_features = pd.DataFrame(scaled, columns=features_names)
scaled_features.head(10)

# SAVE FILE HASIL SCALER
# scaler_filename = "scaled.save"
# joblib.dump(scaler, scaler_filename) 
# scaler = joblib.load(scaler_filename)

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

	quality	prescreen	ma2	ma3	ma4	ma5	ma6	ma7	exudate8	exudate9	exudate10	exudate11	exudate12	exudate13	exudate14	exudate15	euDist	diameter	amfm_class
0	1.0	1.0	0.140000	0.160305	0.176471	0.173077	0.177083	0.147727	0.122764	0.106359	0.049693	0.012913	0.000362	0.000342	0.000661	0.001271	0.530801	0.261133	1.0
1	1.0	1.0	0.153333	0.175573	0.176471	0.163462	0.156250	0.136364	0.142126	0.142403	0.031351	0.003918	0.000076	0.000194	0.000657	0.001264	0.682302	0.536341	0.0
2	1.0	1.0	0.406667	0.450382	0.487395	0.509615	0.479167	0.363636	0.137472	0.167497	0.119614	0.081188	0.027106	0.018571	0.007043	0.002509	0.726836	0.437973	0.0
3	1.0	1.0	0.360000	0.396947	0.436975	0.471154	0.437500	0.340909	0.099403	0.110368	0.085971	0.051525	0.016340	0.013555	0.001289	0.000496	0.514678	0.352675	0.0
4	1.0	1.0	0.286667	0.328244	0.361345	0.384615	0.395833	0.295455	0.043799	0.051281	0.003869	0.000000	0.000000	0.000000	0.000000	0.000000	0.481936	0.407122	0.0
5	1.0	1.0	0.286667	0.320611	0.336134	0.384615	0.375000	0.318182	0.069395	0.041498	0.021738	0.005417	0.000000	0.000000	0.000000	0.000000	0.601764	0.426770	0.0
6	1.0	0.0	0.186667	0.213740	0.235294	0.250000	0.250000	0.170455	0.037412	0.054531	0.015400	0.000000	0.000000	0.000000	0.000000	0.000000	0.775126	0.506339	0.0
7	1.0	1.0	0.033333	0.038168	0.042017	0.048077	0.010417	0.000000	0.050374	0.056828	0.011536	0.002516	0.000000	0.000000	0.000000	0.000000	0.929166	0.081622	1.0
8	1.0	1.0	0.140000	0.152672	0.142857	0.134615	0.125000	0.102273	0.164381	0.140880	0.057991	0.008305	0.000000	0.000000	0.000000	0.000000	0.589477	0.365093	0.0
9	1.0	1.0	0.520000	0.564885	0.605042	0.673077	0.656250	0.522727	0.053997	0.060159	0.008246	0.001670	0.000455	0.000000	0.000000	0.000000	0.860738	0.317608	0.0

Data Baru#

Lalu label yang sudah dirubah selanjutnya dimasukkan pada data, sehingga tercipta dataframe yang baru

scaled_features['label'] = y
scaled_features

	quality	prescreen	ma2	ma3	ma4	ma5	ma6	ma7	exudate8	exudate9	exudate10	exudate11	exudate12	exudate13	exudate14	exudate15	euDist	diameter	amfm_class	label
0	1.0	1.0	0.140000	0.160305	0.176471	0.173077	0.177083	0.147727	0.122764	0.106359	0.049693	0.012913	0.000362	0.000342	0.000661	0.001271	0.530801	0.261133	1.0	0
1	1.0	1.0	0.153333	0.175573	0.176471	0.163462	0.156250	0.136364	0.142126	0.142403	0.031351	0.003918	0.000076	0.000194	0.000657	0.001264	0.682302	0.536341	0.0	0
2	1.0	1.0	0.406667	0.450382	0.487395	0.509615	0.479167	0.363636	0.137472	0.167497	0.119614	0.081188	0.027106	0.018571	0.007043	0.002509	0.726836	0.437973	0.0	1
3	1.0	1.0	0.360000	0.396947	0.436975	0.471154	0.437500	0.340909	0.099403	0.110368	0.085971	0.051525	0.016340	0.013555	0.001289	0.000496	0.514678	0.352675	0.0	0
4	1.0	1.0	0.286667	0.328244	0.361345	0.384615	0.395833	0.295455	0.043799	0.051281	0.003869	0.000000	0.000000	0.000000	0.000000	0.000000	0.481936	0.407122	0.0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1146	1.0	1.0	0.220000	0.251908	0.277311	0.307692	0.312500	0.261364	0.014179	0.005609	0.000294	0.000052	0.000000	0.000000	0.000000	0.000000	0.756089	0.365106	0.0	0
1147	1.0	1.0	0.320000	0.366412	0.403361	0.461538	0.458333	0.409091	0.155722	0.163809	0.076060	0.016390	0.000030	0.000000	0.000000	0.000000	0.663701	0.410954	0.0	0
1148	1.0	0.0	0.320000	0.358779	0.394958	0.423077	0.437500	0.363636	0.074612	0.083569	0.016624	0.002307	0.000218	0.000000	0.000000	0.000000	0.859281	0.446002	0.0	0
1149	1.0	1.0	0.253333	0.267176	0.235294	0.211538	0.125000	0.068182	0.099548	0.075419	0.044696	0.018030	0.010958	0.016263	0.040346	0.056559	0.526653	0.302456	1.0	1
1150	1.0	1.0	0.040000	0.045802	0.050420	0.057692	0.062500	0.045455	0.171150	0.042074	0.016504	0.000780	0.000412	0.000422	0.000000	0.000000	0.839500	0.192513	0.0	0

1151 rows × 20 columns

GaussianNB#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG `SUDAH` DI NORMALISASI#

Persiapan Libary#

Pada tahap dibawah dilakukan split data train dengan nilai 0.2 , serta dilakukan prediksi dan menghitung probalitas

from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix


X_train, X_test = train_test_split(scaled, train_size = 0.8, test_size = 0.2, shuffle=False)
y_train, y_test = train_test_split(y, train_size = 0.8, test_size = 0.2, shuffle=False)

clf2 = GaussianNB()
clf2.fit(X_train, y_train)
y_pred = clf2.predict(X_test)
probas = clf2.predict_proba(X_test)[:,1]
y_test.shape + probas.shape

(231, 231)

Hasil Perhitungan Probabilitas#

Perhitungan diatas menggunakan Rumus Probabilitas :

P(A) = \(\frac{n(A)}{n(S)}\)

  ket:
  P(A)  = Peluang suatu kejadian "A"
  n(A) = Banyak kejadian yang diinginkan
  n(S) = Jumlah total kejadian sample

probas

array([0.75620474, 0.11189161, 1.        , 0.10486252, 0.14252695,
       1.        , 0.87186459, 0.08938164, 0.06195771, 0.07016774,
       1.        , 0.07309769, 0.99377827, 0.08398282, 0.99999961,
       0.85347824, 0.91850879, 0.07858852, 0.64476537, 0.21435453,
       0.09334844, 0.9769378 , 0.95011339, 0.68211181, 0.96900188,
       0.99890913, 0.99724613, 0.16434853, 0.86018358, 0.99999363,
       0.53634466, 0.08895374, 0.98313892, 0.29158109, 0.25465845,
       1.        , 0.42762289, 0.88627287, 0.48478246, 0.68057536,
       0.09285228, 0.92485924, 0.12602622, 0.99996256, 0.67208144,
       0.95535994, 0.08212779, 0.92138625, 0.14110054, 0.29346465,
       0.99957587, 0.09783595, 0.07974339, 0.63233074, 0.2814182 ,
       1.        , 0.14248748, 1.        , 1.        , 0.99977481,
       0.94905966, 0.32335402, 0.66732001, 0.98470723, 0.99924866,
       0.91560627, 0.09860317, 0.99956381, 0.13165869, 0.93304628,
       1.        , 0.1097288 , 0.336975  , 0.67880805, 0.07551187,
       0.15046226, 0.06612315, 0.07050754, 0.06605462, 0.88089312,
       1.        , 0.99908009, 0.65508392, 0.85384983, 0.99997971,
       0.85982459, 1.        , 0.22228045, 1.        , 0.51881583,
       1.        , 0.1601833 , 0.56489793, 1.        , 0.95912747,
       0.99390307, 0.06669617, 0.99994093, 0.95697036, 0.66801819,
       0.86976186, 0.06706977, 0.07242654, 0.47046825, 0.84920321,
       1.        , 0.05787343, 0.99982102, 0.08012137, 0.99891305,
       0.99999133, 0.18732054, 0.83260846, 0.16077613, 0.799558  ,
       0.42904874, 0.98720071, 1.        , 0.96964046, 0.08441927,
       0.12885481, 0.94033065, 0.67612727, 0.19183544, 0.07696444,
       0.26696759, 0.17817935, 0.16196547, 1.        , 0.86166374,
       0.44565553, 0.99998881, 0.99228367, 1.        , 0.99985484,
       1.        , 0.99999999, 0.11737421, 0.07561375, 0.8821287 ,
       0.22906703, 0.55149864, 1.        , 0.39346773, 0.12066409,
       0.51639957, 1.        , 1.        , 0.29719598, 0.15344951,
       0.0863146 , 0.29259063, 0.41884135, 0.05694947, 0.08461161,
       1.        , 0.7729152 , 1.        , 0.97079294, 0.54664137,
       1.        , 0.11629832, 0.08089507, 0.07805169, 0.99948837,
       0.18394124, 0.99996964, 0.58947713, 0.08030054, 0.43904287,
       0.66311756, 0.07841724, 0.98371956, 0.55877777, 0.96593647,
       0.55264935, 0.9813472 , 0.9862656 , 0.99975574, 0.36828985,
       0.7337582 , 0.05763816, 0.11919843, 0.48677852, 0.15423286,
       0.56479739, 0.9246502 , 0.07800551, 0.88966201, 0.35381276,
       0.92915329, 0.72146675, 1.        , 0.99847688, 1.        ,
       1.        , 0.94776044, 0.16369382, 0.06581339, 0.95954445,
       0.2764462 , 0.9983774 , 0.49497013, 0.05286252, 0.14214012,
       0.07261336, 0.08716287, 0.9895085 , 0.12310251, 0.78514343,
       0.22433016, 0.65895065, 0.99994182, 0.89992995, 1.        ,
       0.47476407, 0.99978951, 0.21982009, 0.07868117, 0.0742566 ,
       0.18242771, 0.99999934, 0.79139724, 0.61353135, 0.36884766,
       0.99927186, 0.33597509, 0.78636633, 0.99509548, 1.        ,
       0.07830934])

Pembulatan#

Pembulatan menjadi 0 dan 1 dari hasil probabilitas diatas

probas = probas.round()
probas

array([1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1.,
       0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0.,
       0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1.,
       0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1.,
       0., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.,
       1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0.,
       0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1., 1., 1.,
       0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1.,
       1., 0., 0., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.,
       0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
       1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 1., 1.,
       0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0.,
       1., 1., 1., 0., 1., 0., 1., 1., 1., 0.])

y_test

array([0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0])

Menghitung hasil Akhir#

Menghitung precision_score , accuracy_score , recall_score , f1_score

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

cm = confusion_matrix(y_test,y_pred)
precision = round(precision_score(y_test,y_pred, average="macro")*100,2)
acc_nb = round(accuracy_score(y_test,y_pred)*100,2)
recall = round(recall_score(y_test,y_pred, average="macro")*100,2)
f1score = round(f1_score(y_test, y_pred, average="macro")*100,2)
print('Konfusi Matrix\n',cm)
print('precision: {}'.format(precision))
print('recall: {}'.format(recall))
print('fscore: {}'.format(f1score))
print('accuracy: {}'.format(acc_nb))

Konfusi Matrix
 [[65 50]
 [34 82]]
precision: 63.89
recall: 63.61
fscore: 63.44
accuracy: 63.64

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG `BELUM` DI NORMALISASI#

Persiapan Libary#

Menghitung Probabilitas

X_train_2, X_test_2 = train_test_split(X, train_size = 0.8, test_size = 0.2, shuffle=False)
y_train_2, y_test_2 = train_test_split(y, train_size = 0.8, test_size = 0.2, shuffle=False)

nb = GaussianNB()
nb.fit(X_train_2, y_train_2)
y_pred_2 = nb.predict(X_test_2)
probas_2 = nb.predict_proba(X_test_2)[:,1]
y_test_2.shape + probas.shape

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

(231, 231)

Hasil dari perhitungan Probabilitas#

probas_2

array([2.44675761e-02, 1.01773885e-03, 1.00000000e+00, 9.47815666e-04,
       1.34612803e-03, 1.00000000e+00, 5.22355851e-02, 7.94813794e-04,
       5.35188240e-04, 6.11399680e-04, 9.99999432e-01, 6.38903416e-04,
       5.64178101e-01, 7.40947996e-04, 9.99951663e-01, 4.50590733e-02,
       8.34303868e-02, 6.90239788e-04, 1.44660822e-02, 2.17747148e-03,
       8.32397496e-04, 2.55027023e-01, 1.33668119e-01, 1.70935484e-02,
       2.02105520e-01, 8.81208832e-01, 7.45754095e-01, 1.58269133e-03,
       4.74006385e-02, 9.99098165e-01, 9.29907570e-03, 7.90665894e-04,
       3.20652663e-01, 3.32866709e-03, 2.76083452e-03, 1.00000000e+00,
       5.96968366e-03, 5.94110417e-02, 7.56560778e-03, 1.69729481e-02,
       8.27793664e-04, 9.03347700e-02, 1.16745907e-03, 9.95375504e-01,
       1.63561902e-02, 1.47791344e-01, 7.24905010e-04, 8.67967501e-02,
       1.32947541e-03, 3.35089635e-03, 9.50193633e-01, 8.78572937e-04,
       7.01652166e-04, 1.37358778e-02, 3.14435173e-03, 9.99999999e-01,
       1.34427434e-03, 1.00000000e+00, 1.00000000e+00, 9.72920955e-01,
       1.30918012e-01, 3.85025743e-03, 1.59931116e-02, 3.42788773e-01,
       9.15196947e-01, 8.06778812e-02, 8.85919633e-04, 9.48915433e-01,
       1.22725175e-03, 1.01586080e-01, 1.00000000e+00, 9.97014480e-04,
       4.10523312e-03, 1.68334678e-02, 6.61775129e-04, 1.43184342e-03,
       5.73620325e-04, 6.13857457e-04, 5.73105244e-04, 5.65418440e-02,
       1.00000000e+00, 8.97929611e-01, 1.51712402e-02, 4.52022695e-02,
       9.97489656e-01, 4.73673741e-02, 1.00000000e+00, 2.31495254e-03,
       1.00000000e+00, 8.64325067e-03, 1.00000000e+00, 1.54346853e-03,
       1.03998310e-02, 1.00000000e+00, 1.59856728e-01, 5.68862498e-01,
       5.78870215e-04, 9.92355622e-01, 1.52617570e-01, 1.60731299e-02,
       5.13056173e-02, 5.82336764e-04, 6.31896182e-04, 7.15995799e-03,
       4.35584870e-02, 9.99999999e-01, 4.98332077e-04, 9.78264469e-01,
       7.05055454e-04, 8.81391356e-01, 9.98913097e-01, 1.82598399e-03,
       3.86635697e-02, 1.54749301e-03, 3.13351187e-02, 6.04598181e-03,
       3.84555614e-01, 1.00000000e+00, 2.05544383e-01, 7.46174201e-04,
       1.20223520e-03, 1.12516941e-01, 1.66317431e-02, 1.91938764e-03,
       6.75484142e-04, 2.93869551e-03, 1.75468756e-03, 1.56243702e-03,
       1.00000000e+00, 4.80938887e-02, 6.46394318e-03, 9.98517406e-01,
       5.09533942e-01, 1.00000000e+00, 9.82421991e-01, 1.00000000e+00,
       9.99998419e-01, 1.07598699e-03, 6.62285702e-04, 5.71101358e-02,
       2.40186886e-03, 9.83334604e-03, 1.00000000e+00, 5.23496849e-03,
       1.10951131e-03, 8.57545944e-03, 1.00000000e+00, 1.00000000e+00,
       3.40220861e-03, 1.46764182e-03, 7.64649819e-04, 3.34099911e-03,
       5.79728248e-03, 4.89839530e-04, 7.47998334e-04, 1.00000000e+00,
       2.66942924e-02, 9.99999832e-01, 2.12168956e-01, 9.63262309e-03,
       1.00000000e+00, 1.06619487e-03, 7.12604227e-04, 6.84127339e-04,
       9.40579172e-01, 1.82079480e-03, 9.96262033e-01, 1.11494340e-02,
       7.07306176e-04, 6.29958669e-03, 1.56448325e-02, 6.87227330e-04,
       3.28325232e-01, 1.01569603e-02, 1.86529948e-01, 9.88952443e-03,
       2.98841332e-01, 3.67585329e-01, 9.70379034e-01, 4.71158116e-03,
       2.18164680e-02, 4.96201304e-04, 1.09137635e-03, 7.62767464e-03,
       1.46743413e-03, 1.03924942e-02, 9.03396799e-02, 6.83172305e-04,
       6.12074732e-02, 4.42492741e-03, 9.60173292e-02, 2.04200456e-02,
       1.00000000e+00, 8.41506342e-01, 1.00000000e+00, 1.00000000e+00,
       1.28134600e-01, 1.58320241e-03, 5.71202886e-04, 1.61299147e-01,
       3.07943522e-03, 8.32857193e-01, 7.87722475e-03, 4.52806026e-04,
       1.34049551e-03, 6.33896620e-04, 7.73407297e-04, 4.28987476e-01,
       1.13283472e-03, 2.84555651e-02, 2.33574217e-03, 1.54021134e-02,
       9.92867699e-01, 6.78937673e-02, 1.00000000e+00, 7.24986327e-03,
       9.72849314e-01, 2.27608368e-03, 6.91472414e-04, 6.49263216e-04,
       1.80605877e-03, 9.99906011e-01, 2.97679734e-02, 1.26767112e-02,
       4.70776922e-03, 9.17498607e-01, 4.08160984e-03, 2.89283876e-02,
       6.21468999e-01, 1.00000000e+00, 6.87419130e-04])

probas_2 = probas_2.round()
probas_2

array([0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1.,
       0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1.,
       0., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1.,
       1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0.,
       0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0.,
       1., 0., 0., 0., 1., 0., 0., 1., 1., 0.])

y_test_2

array([0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0])

Mencari Hasil Akhir#

cm_2 = confusion_matrix(y_test_2,probas_2)
precision_2 = round(precision_score(y_test_2,probas_2, average="macro")*100,2)
acc_nb_2 = round(accuracy_score(y_test_2,probas_2)*100,2)
recall_2 = round(recall_score(y_test_2,probas_2, average="macro")*100,2)
f1score_2 = round(f1_score(y_test_2, probas_2, average="macro")*100,2)
print('Konfusi Matrix\n',cm_2)
print('precision: {}'.format(precision_2))
print('recall: {}'.format(recall_2))
print('fscore: {}'.format(f1score_2))
print('accuracy: {}'.format(acc_nb_2))

Konfusi Matrix
 [[102  13]
 [ 69  47]]
precision: 68.99
recall: 64.61
fscore: 62.37
accuracy: 64.5

K - NN#

PROSES PADA K-NN MENGGUNAKAN DATA YANG `BELUM` DI NORMALISASI#

Cara Kerja Algoritma K-Nearest Neighbor#

Menentukan nilai K#

Nilai k pada algoritma KNN mendefinisikan berapa banyak tetangga yang akan diperiksa untuk menentukan klasifikasi titik kueri tertentu. Misalnya, jika k=1, instance akan ditugaskan ke kelas yang sama dengan tetangga terdekatnya.

Menghitung Jarak Data#

Hitung jarak dari jumlah tetangga K (bisa menggunakan salah satu metrik jarak, misalnya Euclidean distance)

Rumus :

\(d(x,y) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}\)

Mengurutkan jarak#

Setelah berhasil menghitung semua jarak pada data latih dengan data uji, maka urutkan semua jarak yang sudah di dapat dari yang terkecil sampai terbesar

Ambil Tetangga Terdekat K#

Ambil tetangga terdekat K sesuai jarak yang dihitung.

Tentukan Mayoritas#

Dari data tetangga yang sudah di ambil berdasarkan K, tentukan kategori mayoritas yang ada. Maka dari itu, data uji tersebut merupakan tetangga dari data mayoritas tersebut.

Split Data#

Pada kasus ini, membagi data menjadi set pelatihan dan pengujian (80:20) %

Rumus :

Jumlah data uji : \(\frac{persentase data uji}{100} jumlah data\)

X_train_knn,X_test_knn,y_train_knn,y_test_knn = train_test_split(X,y,test_size=0.2,random_state=4)
#shape of train and test objects
print(X_train_knn.shape)
print(X_test_knn.shape)

(920, 19)
(231, 19)

Mencari Akurasi dengan K = 1 sampai 25#

mencari akurasi yang lebih besar dengan data latih dan data uji yang sudah di split di atas. Namun sebelum menghitung akurasi, pada proses ini juga sudah menghitung jarak antar data sehingga dapat mengghitung akurasi.

Terdapat proses menghitung jarak

Rumus :

\(d(x,y) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}\)

#import the KNeighborsClassifier class from sklearn
from sklearn.neighbors import KNeighborsClassifier

#import metrics model to check the accuracy 
from sklearn import metrics
#Try running from k=1 through 25 and record testing accuracy
k_range = range(1,26)
scores = {}
scores_list = []
for k in k_range:
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X_train_knn,y_train_knn)
        y_pred_knn=knn.predict(X_test_knn)
        scores[k] = metrics.accuracy_score(y_test_knn,y_pred_knn)
        scores_list.append(metrics.accuracy_score(y_test_knn,y_pred_knn))

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

scores_list

[0.6536796536796536,
6233766233766234,
6666666666666666,
6406926406926406,
6277056277056277,
6536796536796536,
6493506493506493,
6493506493506493,
6493506493506493,
658008658008658,
670995670995671,
6623376623376623,
6536796536796536,
658008658008658,
6536796536796536,
670995670995671,
658008658008658,
6493506493506493,
6493506493506493,
6363636363636364,
6406926406926406,
658008658008658,
6493506493506493,
6536796536796536,
645021645021645]

Visualisasi Hasil Akurasi#

Menggunakan matplotlib

%matplotlib inline
import matplotlib.pyplot as plt

#plot the relationship between K and the testing accuracy
plt.plot(k_range,scores_list)
plt.xlabel('Value of K for KNN')
plt.ylabel('Testing Accuracy')

Text(0, 0.5, 'Testing Accuracy')

Menghitung N satuan#

knn = KNeighborsClassifier(n_neighbors = 5)

Fitting the kNN Model

knn.fit(X_train_knn,y_train_knn)

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

KNeighborsClassifier()

Predict the test set Result

yy_pred = knn.predict(X_test_knn)
yy_pred

/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:1688: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['tuple']. An error will be raised in 1.2.
  warnings.warn(

array([0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0,
       1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1])

the final result

cm_knn = confusion_matrix(y_test_knn,yy_pred)
precision_knn = round(precision_score(y_test_knn,yy_pred, average="macro")*100,2)
acc_nb_knn = round(accuracy_score(y_test_knn,yy_pred)*100,2)
recall_knn = round(recall_score(y_test_knn,yy_pred, average="macro")*100,2)
f1score_knn = round(f1_score(y_test_knn, yy_pred, average="macro")*100,2)
print('Konfusi Matrix\n',cm_knn)
print('precision: {}'.format(precision_knn))
print('recall: {}'.format(recall_knn))
print('fscore: {}'.format(f1score_knn))
print('accuracy: {}'.format(acc_nb_knn))

Konfusi Matrix
 [[78 31]
 [55 67]]
precision: 63.51
recall: 63.24
fscore: 62.69
accuracy: 62.77

PROSES PADA K-NN MENGGUNAKAN DATA YANG `SUDAH` DI NORMALISASI#

Split Data#

X_train_knn2,X_test_knn2,y_train_knn2,y_test_knn2 = train_test_split(scaled,y,test_size=0.2,random_state=4)
#shape of train and test objects
print(X_train_knn2.shape)
print(X_test_knn2.shape)

(920, 19)
(231, 19)

Mencari Akurasi dengan K = 1 sampai 25#

#import the KNeighborsClassifier class from sklearn
from sklearn.neighbors import KNeighborsClassifier

#import metrics model to check the accuracy 
from sklearn import metrics
#Try running from k=1 through 25 and record testing accuracy
k_range = range(1,26)
scores = {}
scores_list = []
for k in k_range:
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X_train_knn2,y_train_knn2)
        y_pred_knn2=knn.predict(X_test_knn2)
        scores[k] = metrics.accuracy_score(y_test_knn2,y_pred_knn2)
        scores_list.append(metrics.accuracy_score(y_test_knn2,y_pred_knn2))

scores_list

[0.6017316017316018,
5800865800865801,
5930735930735931,
6103896103896104,
6277056277056277,
6623376623376623,
6666666666666666,
6753246753246753,
6277056277056277,
6190476190476191,
6233766233766234,
6320346320346321,
658008658008658,
6277056277056277,
658008658008658,
6493506493506493,
6666666666666666,
670995670995671,
683982683982684,
6753246753246753,
6883116883116883,
6883116883116883,
6623376623376623,
6753246753246753,
670995670995671]

Visualisasi Hasil Akurasi#

Menggunakan matplotlib

%matplotlib inline
import matplotlib.pyplot as plt

#plot the relationship between K and the testing accuracy
plt.plot(k_range,scores_list)
plt.xlabel('Value of K for KNN')
plt.ylabel('Testing Accuracy')

Text(0, 0.5, 'Testing Accuracy')

Menghitung N satuan#

knn = KNeighborsClassifier(n_neighbors = 5)

Fitting the kNN Model

knn.fit(X_train_knn2,y_train_knn2)

KNeighborsClassifier()

Predict the test set Result

yy_pred2 = knn.predict(X_test_knn2)
yy_pred2

array([0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1])

the final result

cm_knn2 = confusion_matrix(y_test_knn2,yy_pred2)
precision_knn2 = round(precision_score(y_test_knn2,yy_pred2, average="macro")*100,2)
acc_nb_knn2 = round(accuracy_score(y_test_knn2,yy_pred2)*100,2)
recall_knn2 = round(recall_score(y_test_knn2,yy_pred2, average="macro")*100,2)
f1score_knn2 = round(f1_score(y_test_knn2, yy_pred2, average="macro")*100,2)
print('Konfusi Matrix\n',cm_knn2)
print('precision: {}'.format(precision_knn2))
print('recall: {}'.format(recall_knn2))
print('fscore: {}'.format(f1score_knn2))
print('accuracy: {}'.format(acc_nb_knn2))

Konfusi Matrix
 [[74 35]
 [51 71]]
precision: 63.09
recall: 63.04
fscore: 62.76
accuracy: 62.77

Fikri Ainun Najib

Tugas 6 | Studi Kasus Diabetic Retinopathy

Contents

Tugas 6 | Studi Kasus Diabetic Retinopathy#

Membaca dan Mengelolah data#

Label encoder#

Mengubah colom#

Normalisasi Pada data#

Data Baru#

GaussianNB#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG `SUDAH` DI NORMALISASI#

Persiapan Libary#

Hasil Perhitungan Probabilitas#

Pembulatan#

Menghitung hasil Akhir#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG `BELUM` DI NORMALISASI#

Persiapan Libary#

Hasil dari perhitungan Probabilitas#

Mencari Hasil Akhir#

K - NN#

PROSES PADA K-NN MENGGUNAKAN DATA YANG `BELUM` DI NORMALISASI#

Cara Kerja Algoritma K-Nearest Neighbor#

Menentukan nilai K#

Menghitung Jarak Data#

Mengurutkan jarak#

Ambil Tetangga Terdekat K#

Tentukan Mayoritas#

Split Data#

Mencari Akurasi dengan K = 1 sampai 25#

Visualisasi Hasil Akurasi#

Menghitung N satuan#

PROSES PADA K-NN MENGGUNAKAN DATA YANG `SUDAH` DI NORMALISASI#

Split Data#

Mencari Akurasi dengan K = 1 sampai 25#

Visualisasi Hasil Akurasi#

Menghitung N satuan#

Fikri Ainun Najib

Tugas 6 | Studi Kasus Diabetic Retinopathy

Contents

Tugas 6 | Studi Kasus Diabetic Retinopathy#

Membaca dan Mengelolah data#

Label encoder#

Mengubah colom#

Normalisasi Pada data#

Data Baru#

GaussianNB#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG SUDAH DI NORMALISASI#

Persiapan Libary#

Hasil Perhitungan Probabilitas#

Pembulatan#

Menghitung hasil Akhir#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG BELUM DI NORMALISASI#

Persiapan Libary#

Hasil dari perhitungan Probabilitas#

Mencari Hasil Akhir#

K - NN#

PROSES PADA K-NN MENGGUNAKAN DATA YANG BELUM DI NORMALISASI#

Cara Kerja Algoritma K-Nearest Neighbor#

Menentukan nilai K#

Menghitung Jarak Data#

Mengurutkan jarak#

Ambil Tetangga Terdekat K#

Tentukan Mayoritas#

Split Data#

Mencari Akurasi dengan K = 1 sampai 25#

Visualisasi Hasil Akurasi#

Menghitung N satuan#

PROSES PADA K-NN MENGGUNAKAN DATA YANG SUDAH DI NORMALISASI#

Split Data#

Mencari Akurasi dengan K = 1 sampai 25#

Visualisasi Hasil Akurasi#

Menghitung N satuan#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG `SUDAH` DI NORMALISASI#

PROSES PADA NAIVE BAYES MENGGUNAKAN DATA YANG `BELUM` DI NORMALISASI#

PROSES PADA K-NN MENGGUNAKAN DATA YANG `BELUM` DI NORMALISASI#

PROSES PADA K-NN MENGGUNAKAN DATA YANG `SUDAH` DI NORMALISASI#