PERBANDINGAN HASIL PREDIKSI HARGA PROPERTI DI DAERAH BROOKLYN MENGGUNAKAN METODE XGBOOST, RANDOM FOREST, DAN LINEAR REGRESSION

Isi Artikel Utama

Adyatma Ruliff Brahmantyo

Abstrak

Hukum supply and demand menyebabkan kenaikan harga properti karena permintaan yang terus meningkat untuk tempat tinggal di daerah padat penduduk seperti Brooklyn, New York. Tujuan dari penelitian ini adalah untuk menemukan model terbaik berdasarkan efisiensi dan akurasi dengan menggunakan algoritma pengajaran mesin seperti Linear Regression, Random Forest, dan XGBoost untuk memprediksi harga properti. Dataset yang digunakan terdiri dari 20.894 data penjualan properti yang memiliki atribut seperti harga penjualan, luas bangunan, dan jumlah unit. Dengan menggunakan random_state acak, eksperimen dilakukan dengan membagi data latihan dan uji menjadi 70:30. Memanfaatkan metrik R2, MAE, RMSE, dan waktu komputasi, kinerja algoritma dinilai. Hasil menunjukkan bahwa XGBoost memiliki kinerja terbaik dengan nilai R2 tertinggi (0.014) dan MAE dan RMSE terendah. Di sisi lain,Linear Regression menunjukkan bahwa XGBoost memiliki kinerja terburuk dengan nilai R2 negatif dan eror yang tinggi. Penelitian ini menemukan bahwa XGBoost memiliki kemampuan yang lebih baik untuk memodelkan data properti dan memberikan prediksi yang lebih akurat daripada metode lain. Hasil-hasil ini dapat membantu pemangku kepentingan, seperti pengembang properti dan pembeli, memahami tren harga properti dan membuat keputusan yang lebih baik berdasarkan data.

Rincian Artikel

Bagian
Articles

Referensi

[1] USA Government, “County Population Totals: 2020-2023.” Accessed: Dec. 03, 2024. [Online]. Available: https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-total.html

[2] USA Government, “Brooklyn borough, Kings County, New York - Census Bureau Profile.” Accessed: Dec. 03, 2024. [Online]. Available: . https://data.census.gov/profile/Brooklyn_borough,_Kings_County,_New_York?g=060XX00US3604710022

[3] J. Gallin, “The Long-Run Relationship between House Prices and Income: Evidence from Local Housing Markets,” Real Estate Economics, vol. 34, no. 3, pp. 417–438, Aug. 2006, doi: 10.1111/j.1540-6229.2006.00172.x.

[4] G. V. Engelhardt and J. M. Poterba, “House prices and demographic change: Canadian evidence,” Reg Sci Urban Econ, vol. 21, no. 4, pp. 539–546, Dec. 1991, doi: 10.1016/0166-0462(91)90017-H.

[5] D. Gale, “Mathematica Scandinavica THE LAW OF SUPPLY AND DEMAND,” 1955. [Online]. Available: https://www.jstor.org/stable/24490348

[6] Q. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler, “What is Machine Learning? A Primer for the Epidemiologist,” Am J Epidemiol, Oct. 2019, doi: 10.1093/aje/kwz189.

[7] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350–361, May 2017, doi: 10.1016/j.neucom.2017.01.026.

[8] B. Zheng, S. W. Yoon, and S. S. Lam, “Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms,” Expert Syst Appl, vol. 41, no. 4, pp. 1476–1482, Mar. 2014, doi: 10.1016/j.eswa.2013.08.044.

[9] Stacyana Jesika, Suci Ramadhani, and Yohanna Permata Putri, “Implementasi Model Machine Learning dalam Mengklasifikasi Kualitas Air,” Jurnal Ilmiah Dan Karya Mahasiswa, vol. 1, no. 6, pp. 382–396, Nov. 2023, doi: 10.54066/jikma.v1i6.1162.

[10] E. Kavlakoglu and E. Russi, “What is XGBoost?,” IBM.

[11] J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Syst Appl, vol. 134, pp. 93–101, Nov. 2019, doi: 10.1016/J.ESWA.2019.05.028.

[12] T. M. H. Hope, “Linear regression,” Machine Learning: Methods and Applications to Brain Disorders, pp. 67–81, Jan. 2020, doi: 10.1016/B978-0-12-815739-8.00004-3.

[13] H. I. Okagbue, P. I. Adamu, S. A. Bishop, E. C. M. Obasi, and A. O. Akinola, “Curve estimation models for estimation and prediction of impact factor and citescore using the journal percentiles: A case study of telecommunication journals,” International journal of online and biomedical engineering, vol. 15, no. 14, pp. 31–41, 2019, doi: 10.3991/ijoe.v15i14.11373.

[14] J. Qi, J. Du, S. M. Siniscalchi, X. Ma, and C.-H. Lee, “On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression,” IEEE Signal Process Lett, vol. 27, pp. 1485–1489, 2020, doi: 10.1109/LSP.2020.3016837.

[15] D. S. K. Karunasingha, “Root mean square error or mean absolute error? Use their ratio as well,” Inf Sci (N Y), vol. 585, pp. 609–629, Mar. 2022, doi: 10.1016/j.ins.2021.11.036.

[16] P. K. Ozili, “The acceptable R-square in empirical modelling for social science research,” 2023.

[17] T. O. Hodson, “Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not,” Geosci Model Dev, vol. 15, no. 14, pp. 5481–5487, Jul. 2022, doi: 10.5194/gmd-15-5481-2022.

[18] W. Darmalaksana, “Metode Penelitian Kualitatif,” Pre-Print Digital Library UIN Sunan Gunung Djati Bandung, 2020.

[19] H. Syahrizal and M. S. Jailani, “Jenis-Jenis Penelitian Dalam Penelitian Kuantitatif dan Kualitatif,” Jurnal QOSIM Jurnal Pendidikan Sosial & Humaniora, vol. 1, no. 1, pp. 13–23, May 2023, doi: 10.61104/jq.v1i1.49.

[20] R. T. Mauli, C. Simorangkir, and M. Maulista, “TUGAS BESAR 2 STATISTIK ANALISIS DATA TIME SERIES KELOMPOK 2,” 2023. [Online]. Available: https://www.researchgate.net/publication/375792430

[21] I. Murni, A. S. Br pa, B. R. Lubis, and A. Ikhwan, “Pengamanan Pesan Rahasia dengan Algoritma Vigenere Cipher Menggunakan PHP,” Journal on Education, vol. 5, no. 2, pp. 3466–3476, Jan. 2023, doi: 10.31004/joe.v5i2.1027.

[22] G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor, “Linear Regression,” 2023, pp. 69–134. doi: 10.1007/978-3-031-38747-0_3.

[23] R. Genuer and J.-M. Poggi, Random Forests with R. Cham: Springer International Publishing, 2020. doi: 10.1007/978-3-030-56485-8.

[24] C. Li et al., “Power Load Forecasting Based on the Combined Model of LSTM and XGBoost,” in Proceedings of the 2019 the International Conference on Pattern Recognition and Artificial Intelligence, New York, NY, USA: ACM, Aug. 2019, pp. 46–51. doi: 10.1145/3357777.3357792.

[25] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput Sci, vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.

[26] T. N. Putri, A. Yordan, and D. H. Lamkaruna, “Peramalan Penerimaan Mahasiswa Baru Universitas Samudra Menggunakan Metode Regresi Linear Sederhana,” 2019. [Online]. Available: https://data.unsam.ac.id/?op=pmb,

[27] G. Arther Sandag, “Prediksi Rating Aplikasi App Store Menggunakan Algoritma Random Forest Application Rating Prediction on App Store using Random Forest Algorithm,” Cogito Smart Journal |, vol. 6, no. 2, 2020, [Online]. Available: https://www.kaggle.com/

[28] M. Dong, L. Yao, X. Wang, B. Benatallah, S. Zhang, and Q. Z. Sheng, “Gradient Boosted Neural Decision Forest,” IEEE Trans Serv Comput, vol. 16, no. 1, pp. 330–342, Jan. 2023, doi: 10.1109/TSC.2021.3133673.