PERBANDINGAN HASIL PREDIKSI HARGA PROPERTI DI DAERAH BROOKLYN MENGGUNAKAN METODE XGBOOST, RANDOM FOREST, DAN LINEAR REGRESSION

Adyatma Ruliff Brahmantyo

PDF

Published: Oct 20, 2023

Keywords:

Prediction, Machine Learning, XGBoost, Linear Regression, Random Forest

Dimensions

Altmetrics

Statistics

Read Counter : 95

Download : 111

Crossmark/ Data Version

Adyatma Ruliff Brahmantyo

Abstract

Due to the law of supply and demand, real estate prices are rising in highly populated places like Brooklyn, New York, as a result of the growing demand for homes. In order to determine which machine learning algorithms—Random Forest, XGBoost, and Linear Regression—perform best in terms of accuracy and efficiency, this study attempts to forecast real estate values. 20,894 property sales records, comprising details such as building area, number of units, sale price, and more, make up the dataset that was used. Using randomized seeds, the data was divided into training and testing sets (70:30) for the experiment. R2, MAE, RMSE, and computation time measures were used to assess the algorithms' performance. With the lowest MAE and RMSE and the greatest R2 value (0.014), the findings demonstrated that XGBoost performed better than the other methods. On the other hand, with negative R2 values and large error rates, Linear Regression performed the worst. This study shows that XGBoost outperforms other approaches in modeling property data, yielding more accurate predictions. The results can help stakeholders, including purchasers and developers, comprehend patterns in real estate prices and make data-driven

Issue

Vol. 18 No. 2 (2023): Jurnal Komputer dan Informatika

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Menawarkan akses terbuka

References

[1] USA Government, “County Population Totals: 2020-2023.” Accessed: Dec. 03, 2024. [Online]. Available: https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-total.html

[2] USA Government, “Brooklyn borough, Kings County, New York - Census Bureau Profile.” Accessed: Dec. 03, 2024. [Online]. Available: . https://data.census.gov/profile/Brooklyn_borough,_Kings_County,_New_York?g=060XX00US3604710022

[3] J. Gallin, “The Long-Run Relationship between House Prices and Income: Evidence from Local Housing Markets,” Real Estate Economics, vol. 34, no. 3, pp. 417–438, Aug. 2006, doi: 10.1111/j.1540-6229.2006.00172.x.

[4] G. V. Engelhardt and J. M. Poterba, “House prices and demographic change: Canadian evidence,” Reg Sci Urban Econ, vol. 21, no. 4, pp. 539–546, Dec. 1991, doi: 10.1016/0166-0462(91)90017-H.

[5] D. Gale, “Mathematica Scandinavica THE LAW OF SUPPLY AND DEMAND,” 1955. [Online]. Available: https://www.jstor.org/stable/24490348

[6] Q. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler, “What is Machine Learning? A Primer for the Epidemiologist,” Am J Epidemiol, Oct. 2019, doi: 10.1093/aje/kwz189.

[7] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350–361, May 2017, doi: 10.1016/j.neucom.2017.01.026.

[8] B. Zheng, S. W. Yoon, and S. S. Lam, “Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms,” Expert Syst Appl, vol. 41, no. 4, pp. 1476–1482, Mar. 2014, doi: 10.1016/j.eswa.2013.08.044.

[9] Stacyana Jesika, Suci Ramadhani, and Yohanna Permata Putri, “Implementasi Model Machine Learning dalam Mengklasifikasi Kualitas Air,” Jurnal Ilmiah Dan Karya Mahasiswa, vol. 1, no. 6, pp. 382–396, Nov. 2023, doi: 10.54066/jikma.v1i6.1162.

[10] E. Kavlakoglu and E. Russi, “What is XGBoost?,” IBM.

[11] J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Syst Appl, vol. 134, pp. 93–101, Nov. 2019, doi: 10.1016/J.ESWA.2019.05.028.

[12] T. M. H. Hope, “Linear regression,” Machine Learning: Methods and Applications to Brain Disorders, pp. 67–81, Jan. 2020, doi: 10.1016/B978-0-12-815739-8.00004-3.

[13] H. I. Okagbue, P. I. Adamu, S. A. Bishop, E. C. M. Obasi, and A. O. Akinola, “Curve estimation models for estimation and prediction of impact factor and citescore using the journal percentiles: A case study of telecommunication journals,” International journal of online and biomedical engineering, vol. 15, no. 14, pp. 31–41, 2019, doi: 10.3991/ijoe.v15i14.11373.

[14] J. Qi, J. Du, S. M. Siniscalchi, X. Ma, and C.-H. Lee, “On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression,” IEEE Signal Process Lett, vol. 27, pp. 1485–1489, 2020, doi: 10.1109/LSP.2020.3016837.

[15] D. S. K. Karunasingha, “Root mean square error or mean absolute error? Use their ratio as well,” Inf Sci (N Y), vol. 585, pp. 609–629, Mar. 2022, doi: 10.1016/j.ins.2021.11.036.

[16] P. K. Ozili, “The acceptable R-square in empirical modelling for social science research,” 2023.

[17] T. O. Hodson, “Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not,” Geosci Model Dev, vol. 15, no. 14, pp. 5481–5487, Jul. 2022, doi: 10.5194/gmd-15-5481-2022.

[18] W. Darmalaksana, “Metode Penelitian Kualitatif,” Pre-Print Digital Library UIN Sunan Gunung Djati Bandung, 2020.

[19] H. Syahrizal and M. S. Jailani, “Jenis-Jenis Penelitian Dalam Penelitian Kuantitatif dan Kualitatif,” Jurnal QOSIM Jurnal Pendidikan Sosial & Humaniora, vol. 1, no. 1, pp. 13–23, May 2023, doi: 10.61104/jq.v1i1.49.

[20] R. T. Mauli, C. Simorangkir, and M. Maulista, “TUGAS BESAR 2 STATISTIK ANALISIS DATA TIME SERIES KELOMPOK 2,” 2023. [Online]. Available: https://www.researchgate.net/publication/375792430

[21] I. Murni, A. S. Br pa, B. R. Lubis, and A. Ikhwan, “Pengamanan Pesan Rahasia dengan Algoritma Vigenere Cipher Menggunakan PHP,” Journal on Education, vol. 5, no. 2, pp. 3466–3476, Jan. 2023, doi: 10.31004/joe.v5i2.1027.

[22] G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor, “Linear Regression,” 2023, pp. 69–134. doi: 10.1007/978-3-031-38747-0_3.

[23] R. Genuer and J.-M. Poggi, Random Forests with R. Cham: Springer International Publishing, 2020. doi: 10.1007/978-3-030-56485-8.

[24] C. Li et al., “Power Load Forecasting Based on the Combined Model of LSTM and XGBoost,” in Proceedings of the 2019 the International Conference on Pattern Recognition and Artificial Intelligence, New York, NY, USA: ACM, Aug. 2019, pp. 46–51. doi: 10.1145/3357777.3357792.

[25] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput Sci, vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.

[26] T. N. Putri, A. Yordan, and D. H. Lamkaruna, “Peramalan Penerimaan Mahasiswa Baru Universitas Samudra Menggunakan Metode Regresi Linear Sederhana,” 2019. [Online]. Available: https://data.unsam.ac.id/?op=pmb,

[27] G. Arther Sandag, “Prediksi Rating Aplikasi App Store Menggunakan Algoritma Random Forest Application Rating Prediction on App Store using Random Forest Algorithm,” Cogito Smart Journal |, vol. 6, no. 2, 2020, [Online]. Available: https://www.kaggle.com/

[28] M. Dong, L. Yao, X. Wang, B. Benatallah, S. Zhang, and Q. Z. Sheng, “Gradient Boosted Neural Decision Forest,” IEEE Trans Serv Comput, vol. 16, no. 1, pp. 330–342, Jan. 2023, doi: 10.1109/TSC.2021.3133673.

Article Sidebar

Main Article Content

Abstract

Article Details

References