Data Science Ethics
The Task
Today, I will be exploring an example of ethics and power in the data science context. Specifically, I will be looking into a specific case study. The ethical dilemma revolves around an event that happened in 2012, which is narrated and given a great overview in Duhigg, C. (2012), I will also include some information from Fung Institute for Engineering Leadership (2013).
So, what happened?
Duhigg (2012) reported that Target’s analytics team (namely, statistician Andrew Pole) built an algorithm that tried to infer which shoppers were likely in their second trimester of pregnancy. Their goal was to be able to start advertising to to-be parents before they even had the child, as they saw that as an opening in the market that they could generate more profit if they could advertise to this population. Andrew Pole was hired by Target to “to identify those unique moments in consumers’ lives [like late-pregnancy] when their shopping habits become particularly flexible and the right advertisement or coupon would cause them to begin spending in new ways” (Duhigg, 2012). In moments of life change, consumers are more highly susceptible to influence by advertisements! Pole looked at the data in Target’s datatables and found that pregnant people buy more lotion, supplements, soap, and cotton balls. So, Pole and analytics team developed an algorithm to predict a person’s due date, so Target could send timely, pregnancy-related coupons exactly when they are most likely to buy these items. Pole applied this algorithm to every regular customer at Target and generated a list of tens of thousands of customers likely to be pregnant so that they could be sent pregnancy-specific coupons and advertisements. The ethical dilemma here is the collection of personal health and family data without explicit consent and acting on the information inferred for corporate gain. When does an analytics team or algorithm go too far?
What is the permission structure for using the data? Was it followed?
Target’s privacy policy essentially says that you automatically consent to have data collected about you unless you explicitly opt out. This means that as long as you shop or interact with Target, you are considered to have consented to their data being used for advertising and analytics. This was done by using transactional data and shopping-habits data to build a pregnancy prediction model in Target, which would influence the ads and coupons a customer received. Target technically did not breach any legal terms as outlined in their privacy policy. But ethically speaking, there was not informed consent on behalf of the customer, since they had no real way of knowing that their data would be used to infer medical or reproductive information and target advertising and coupons to you.
How were the variables collected? Were they accurately recorded? Is there any missing data?
Target tracked each shopper using a unique Guest ID, which linked together various variables related to shoppers. Namely, they identified 25 products that were likely to be purchased more by people in their second semester of pregnancy. The purchasing data of these 25 specific products were analyzed together and made a “Pregnancy Prediction Score” for each consumer. When a customer had a high pregnancy prediction score, they were marked to receive advertisements and coupons mailed to their home, because of course Target has this data as well. The transactional data itself was likely accurately recorded, as Target has tables that hold all of its customer’s transactions, what they purchased so that they can do analyses like this one. However, applying those 25 specific products to generate a pregnancy prediction score defninitely is not perfectly accurate. For example, a person buying unscented lotion does not automatically ensure that they are pregnant. So, the algorithm used by Target, while definitely good at predicting pregnancy, is not perfectly accurate. It also makes you me ask the question about if it is weirder if Target knows a person is pregnant and that person actually is or if Target thinks someone is pregnant but in not. Why should Target get to speculate about this?
Publicity of data
In the Target pregnancy-prediction case, the data were not made publicly available. Instead, the company treated them as proprietary, confidential commercial assets that they would use specifically for profit-gain of the company. In this case, it’s likely a good thing that the data were not made public, since the information they were inferring, pregnancy, is highly personal. It would have made it even worse if this inferred information was made public.
Consent structure
Customers’ shopping activity served as the data source for Target’s data collection, and their continued use of Target’s services was treated as implicit consent under the company’s general privacy policy. There were no explicit consent forms, no sign-up for sensitive analytics, and no notifications that predictive models were being trained on their purchasing behavior. Customers were unaware that their data was being used for pregnancy prediction, and Target made additional efforts to keep that hidden, specifically by making the pregnancy related ads appear random, for example by putting advertisements for diapers right next to advertisements for lawn mowers, with the purpose of making consumers unaware that they were being targeted for pregnancy-related coupons (Duhigg 2012). The way that Target was intentionally trying to make the pregnancy-related ads appear random makes me think that they knew something was off with what they were doing.
Why does this matter?
Who benefits? Target. In this case, Target aimed to exploit key turning points in people’s lives like having a child as an opportunity for profit. Also, analytics teams and people like Pole benefit from the prestige they gain for building “sophisticated” models and algorithms. Who is neglected/harmed? Consumers whose data was used for building the algorithm and those surveyed by the algorithm had their privacy invaded and personal health information inferred without their knowing consent. Target even attempted to disguise the pregnancy-related advertisements by mixing them with unrelated promotions, which shows an awareness that their actions might be perceived as intrusive. The Fung Institute warns of customer backlash when people “feel as if they are being observed a little too closely”. The Fung Institute also provides some more points specifically for corporations that are using data analytics in their business models, namely that “big data leadership requires judgment for ethical considerations and privacy”, something that Target may have overlooked.
The ethical violations in the Target case were absolutely done in the pursuit of increasing profit. The entire program was driven by the desire for profit by taking advantage of consumers in the times of their lives when they are most susceptible to advertisements. There was also an increase in surveillance in Target’s data analytics, as they were now collecting data not just for analysis on their business, but for health-related predictions of consumers lives. The resulting power imbalance, where companies infer personal and private details about individuals who remain unaware of the data being collected on them, emphasizes why this case remains so relevant today as algorithms grow increasingly powerful.
That’s it!
See you next time!
References
“Avoiding the Traps of Big Data.” Fung Institute for Engineering Leadership, 7 Apr. 2015, funginstitute.berkeley.edu/news/avoiding-the-traps-of-big-data/.
Duhigg, Charles. “How Companies Learn Your Secrets.” NY Times Magazine, 16 Feb. 2012, www.nytimes.com/2012/02/19/magazine/shopping-habits.html.
“Target Privacy Policy.” Target, www.target.com/c/target-privacy-policy/-/N-4sr7p.