In early 2023, Maggioncalda, CEO of US EdTech firm Coursera, launched Project Genesis to develop a strategy for incorporating GenAI capabilities into the firm's offerings, asking his teams to focus on value to the firm and cost of implementation. The team identified several projects: powering translations and modifying content format and delivery, personalized coaching, an automatic course-building tool, and the building out of new GenAI-related academic content. By early 2024, the firm had made significant progress in bringing these ideas to market, but there was still much to do. Technology was fast evolving, and Coursera needed to continuously improve its offerings. Maggioncalda wanted better branding to make personalized coaching a distinctive advantage of the firm's platform. For its course builder tool, Coursera and its university partners had to work through difficult intellectual property and branding questions. But more broadly, Maggioncalda remained alert to the risks presented by the advent of GenAI. While the firm had been an early mover, competitors were fast adapting. Was Coursera taking full advantage of the opportunities presented by the technology? What more could it do?
In this article, we will discuss the concept of AI Products, how they are changing our daily lives, how the field of AI & Product Management is evolving, and the AI Product Development Lifecycle.
The case describes Arla's history, in particular its climate change mitigation efforts, and how it implemented a price incentive system to motivate individual farms to implement scope 1 greenhouse gas emissions mitigation measures and receive a higher milk price. The case, and its data supplement, highlight Arla's use of a data score card and regression analysis model to track CO2 emissions across dairy farms in multiple European countries.
Arla implemented a data based price incentive systems to measure, track, and influence climate friendly changes to reduce CO2 emissions across the world's fourth largest dairy cooperative.
This case presents an inventory challenge. The goal of the case is to encourages students to redefine problem statements that Fizzy Fusion is faced with and to use innovative thinking to come up with solutions that can address the core underlying problem that the company is facing. The case encourages students to think from the perspective of the customer to solve data problems.
We hear it all the time as managers: "what is the data that backs up your decisions?" Even local mom-and-pop shops now have access to complex point-of-sale systems that can closely track sales and customer data. Social media influencers have turned into seven-figure solopreneurs from digital advertising analytics. In 2023, globally, we will create three times the data we did in 2019, and by 2025, it is estimated that 181 zettabytes of data will be generated (that is 181 followed by 21 zeros). Data is becoming a critical, arguably inextricable, part of business operations in our modern context. Data-driven decision-making (DDDM) uses data to inform decisions rather than relying on intuition. The digital era has given rise to the importance of data science for business applications. This technical note explores how different design thinking principles can assist the data-driven processes in a project.
This note provides an overview of causal inference for an introductory data science course. First, the note discusses observational studies and confounding variables. Next the note describes how randomized experiments can be used to account for the effect of confounding variables. Then it walks through the steps to designing an experiment, including a discussion of how to calculate the power of a test.
This note provides an introduction to machine learning for an introductory data science course. The note begins with a description of supervised, unsupervised, and reinforcement learning. Then, the note provides a brief explanation of the difference between traditional statistical modeling and machine learning. Next, the note covers two models used for classification, logistic regression and decision trees. After introducing these two models, the note explains how train, validation, and holdout sets (and k-fold cross validation) are used to tune and evaluate different models. Finally, the note concludes with a discussion of different performance metrics (ROC cruves, confusion matrices, log loss) that are used to evaluate classification models.
This note provides an overview of linear regression for an introductory data science course. It begins with a discussion of correlation, and explains why correlation does not necessarily imply causation. The note then describes the method of least squares , and how to interpret the r-squared and model coefficient values of a simple linear regression model. Next, the note describes how the interpretation of a model coefficient changes when there are multiple independent variables in the model. Finally, the note explains how to interpret the coefficients on dummy variables in a regression model. The appendix includes R code for implementing all of these topics.
This note provides an overview of statistical inference for an introductory data science course. First, the note discusses samples and populations. Next the note describes how to calculate confidence intervals for means and proportions. Then it walks through the logic of hypothesis testing and the interpretation of p-values (in the context of two-sample hypothesis testing for means and proportions). The appendix of the note contains R code for all of these topics.
This module note provides an overview of exploratory data analysis for an introduction to data science course. It begins by defining the term "data", and then describes the different types of data that companies work with (structured v. unstructured, categorical v. numeric, etc.). Next, the note describes the basic summary statistics that firms use to track key business outcomes. Finally, the note provides an overview of different visualizations. An appendix is provided, which includes the R code for creating all of the figures and visualizations shown in the note.
The case explores the development and early growth of a data science team at the Golden State Warriors, an NBA team based in San Francisco. The case begins by explaining the initial rationale for investing in data science, then covers a debate on the appropriate team structure, navigating the initial hires, and which projects the team should prioritize. The rest of the case describes the first major project the team worked on: a model to predict when customers would purchase tickets. Along the way, the team had a number of important decisions that they needed to make around what outcome to model and what model to use. Unfortunately, just as the model was completed, the team faced a setback when the NBA season was suspended due to COVID-19, pausing the project for a year. The case then takes place before the start of the 2021-22 seasons, with the team needing to determine if they should use the seasons to run an experiment and evaluate their model on real customers or if they should fully launch the model in the hope that it increases ticket sales. The case comes with supplementary data and code (HBS Supplement no 624-712) that allows students to perform the analysis described in the case.
Describes a marketing director about to launch a new process for demand forecasting. Provides data that allow students to do a multivariable regression analysis. A rewritten version of an earlier case.
This technical note introduces students to the concept of random variables, and from there the normal and binomial distributions. After a brief introduction to random variables, the note describes the standard properties of the normal distribution: a single peak, and a symmetric, bell-shaped curve. Students observe the 68-95-99.7 rule, and see how the distribution changes with different values of the mean and standard deviation parameters. Finally, the note demonstrates how probability calculations based on the normal distribution can be done in the R programming language, and how random data can be simulated from a normal curve in R. The note then describes the standard properties of the binomial distribution, and similarly shows how binomial calculations can be performed in R.