• Multivariate Datasets: Data Cleaning and Preparation, and Model Development with Python and Machine Learning

    Data cleaning, data preparation, and model development are the crucial steps in data analytics. The first two steps aim to improve data quality for higher accuracy, improved productivity, and better efficiency in modelling and obtaining results. The last step, model development, seeks to improve accuracy of prediction, especially in predictive modelling. In this technical note, we use a sample to illustrate how to work with a multivariate dataset in Python. This dataset’s massive number of variables requires different approaches to data cleaning, preparation, and model development, such as data normalization and dimension reduction.
    詳細資料
  • A Technical Note on Data Preparation and Model Building with a Real Estate Dataset

    Data preparation is a necessary pre-processing step in analytics. It aims to clean the data from various resources and improve its quality for better productivity. This process includes many tasks such as fusion, cleaning, and augmentation of data. This teaching note will focus on illustrating data cleaning using the programming language Python, with all codes completed in Google Laboratory. Different solutions using the programming languages R and Microsoft Excel are also provided. To effectively illustrate the data preparation process, the relatively simple dataset Bengaluru House Prices is used. This is a relatively messy dataset with a few variables and many records, making it ideal for explaining data preparation steps.
    詳細資料