Noah Joss had received a full-time job offer from Genesis Consulting, following his summer internship there. While a full-time offer had been his goal, Joss started to think critically about the compensation package and the working hours expected of him. Worried about making the wrong decision, Joss worked with his study group to come up with a compensation survey for those working in consulting that could be used as a reference point when deciding whether to accept the offers they had secured. After receiving a data set of survey responses from a consulting-focused media organization, Joss needed to analyze the data to inform his decision of whether to accept the offer, as well as to better understand the consulting sector’s compensation packages.
Data cleaning, data preparation, and model development are the crucial steps in data analytics. The first two steps aim to improve data quality for higher accuracy, improved productivity, and better efficiency in modelling and obtaining results. The last step, model development, seeks to improve accuracy of prediction, especially in predictive modelling. In this technical note, we use a sample to illustrate how to work with a multivariate dataset in Python. This dataset’s massive number of variables requires different approaches to data cleaning, preparation, and model development, such as data normalization and dimension reduction.
Data preparation is a necessary pre-processing step in analytics. It aims to clean the data from various resources and improve its quality for better productivity. This process includes many tasks such as fusion, cleaning, and augmentation of data. This teaching note will focus on illustrating data cleaning using the programming language Python, with all codes completed in Google Laboratory. Different solutions using the programming languages R and Microsoft Excel are also provided. To effectively illustrate the data preparation process, the relatively simple dataset Bengaluru House Prices is used. This is a relatively messy dataset with a few variables and many records, making it ideal for explaining data preparation steps.
In 2014, the owner of a food truck based in Hamilton, Ontario, was looking over the first year of her operations. In addition to working in Hamilton, she had tried to maximize her revenues by driving to several other cities and charging various prices for each burger, depending partly on the fresh ingredients available in each city. Besides location, the owner had collected data on a few other factors—the weather, the day of the week, the city’s population, and whether a festival was going on—that had had an impact on the demand for her product. She wondered whether analytics could help her decide where to sell and how much to charge on a daily basis. The owner also wondered whether this decision-making and data-collection process could be automated since she would be using it every day.
In July 2014, the managing director of Lakshmi Projects in Delhi, India, finds himself struggling with the marketing and sales strategy for the year ahead. Founded in 1997, the company specializes in offering turnkey solutions for bulk material handling systems for industries in the fast-growing infrastructure segment of the Indian economy; its two main product categories are elevator and conveyor systems. Yet, the company was failing to meet its sales targets, largely due to an overextended and underachieving salesforce. What was the right structure for the sales, after-sales and quality teams in the organization? An additional concern was that a sales strategy for the company’s new product, set to launch in October 2014, had not yet been decided. Fluctuating industry dynamics, financial strains, field sales and service requirements meant that this was a complex decision that held larger consequences for the company’s future.
Hyrule Cinemas is losing money quickly and its owner must take steps to rectify the problem. Using survey data and general information about the business, three types of analysis can be completed: Van Westendorp, conjoint, and a decision tree. These analyses will enable Hyrule Cinemas to make the best decision possible about price points and location, thereby helping the company to become profitable. A student spreadsheet is available (see 7B14E012).
Mistral Energy is looking to build a $40 million power plant in close proximity to both the Alberta and Saskatchewan power markets. The Alberta market is deregulated and the price fluctuates hourly with supply and demand. The Saskatchewan market, on the other hand, is a regulated monopoly. Mistral Energy needs to understand into which market they should sell their power. Because the prices available in Saskatchewan are unknown, Mistral is particularly interested in what power price would make the company indifferent between markets. Additionally, because the power plant is roughly equidistant between Alberta and Saskatchewan transmission lines, it might be possible to choose between markets on an hourly basis. Mistral is interested in investigating the value of this inter-market connection. Unfortunately, for technical reasons, this switch is not instantaneous, and the plant must be shut down for 30 minutes before supplying power into the other market. Another challenge is predicting when the power price in Alberta will be greater than the contract price available in Saskatchewan. Because the future Alberta price is unknown and highly variable, the risk exists that high prices might not be sustained long enough for Mistral to realize any value.
Whenever an investment is made, there are always costs and benefits. This case set in 2013 explores, based on quantitative and qualitative factors, a decision that a typical student at university might face. The decision to rent a house or purchase one in order to rent out the additional rooms is a difficult one. All of the pros and cons of each option need to be carefully considered. An Excel spreadsheet is available for students (see 7B14E013).