學門類別
政大
哈佛
最新個案
- Leadership Imperatives in an AI World
- Vodafone Idea Merger - Unpacking IS Integration Strategies
- Snapchat’s Dilemma: Growth or Financial Sustainability
- V21 Landmarks Pvt. Ltd: Scaling Newer Heights in Real Estate Entrepreneurship
- Predicting the Future Impacts of AI: McLuhan’s Tetrad Framework
- Did I Just Cross the Line and Harass a Colleague?
- TNT Assignment: Financial Ratio Code Cracker
- Porsche Drive (A): Vehicle Subscription Strategy
- Porsche Drive (A) and (B): Student Spreadsheet
- Porsche Drive (B): Vehicle Subscription Strategy
-
Multivariate Datasets: Data Cleaning and Preparation, and Model Development with Python and Machine Learning
Data cleaning, data preparation, and model development are the crucial steps in data analytics. The first two steps aim to improve data quality for higher accuracy, improved productivity, and better efficiency in modelling and obtaining results. The last step, model development, seeks to improve accuracy of prediction, especially in predictive modelling. In this technical note, we use a sample to illustrate how to work with a multivariate dataset in Python. This dataset’s massive number of variables requires different approaches to data cleaning, preparation, and model development, such as data normalization and dimension reduction. -
A Technical Note on Data Preparation and Model Building with a Real Estate Dataset
Data preparation is a necessary pre-processing step in analytics. It aims to clean the data from various resources and improve its quality for better productivity. This process includes many tasks such as fusion, cleaning, and augmentation of data. This teaching note will focus on illustrating data cleaning using the programming language Python, with all codes completed in Google Laboratory. Different solutions using the programming languages R and Microsoft Excel are also provided. To effectively illustrate the data preparation process, the relatively simple dataset Bengaluru House Prices is used. This is a relatively messy dataset with a few variables and many records, making it ideal for explaining data preparation steps.