Interview

The Morton’s fork of Dangerous Data: Can companies escape this new witch-hunt?

Do they know we know? What a Sophie’s choice! Holding a certain kind of data is like being an eyewitness to a murder.

DQINDIA Online

20 Oct 2023 09:53 IST

New Update

Do they know we know? What a Sophie’s choice! Holding a certain kind of data is like being an eyewitness to a murder. Should you hold your breath till the killer walks away? Or risk losing your own neck because you saw something you were not supposed to?

Advertisment

At the recent Gartner Data & Analytics Summit India, there was an interesting session on ‘Dangerous Data’. Gareth Herschel, VP Analyst, Research Engagement Services, Gartner helped us to double-click on this subject on the sidelines of this summit. It was both fun and thought-provoking to juggle issues like possession of tricky data, plausible deniability, third-party data, and data localization while also fiddling with possible answers like privacy cloak, Blockchain, Zero-proof models, and data ethics.

The key point of dangerous data is that you really don’t have a choice about collecting it, but there is also no automatic answer about what to do with it now that you have it.

How would you explain ‘dangerous data’ to an ignorant CEO if you had to—in two or three lines?

Advertisment

Your organization is collecting data as a necessary part of its normal operations which has potentially negative consequences. Even worse, there is often no automatically “correct” way of responding to these consequences as both taking an action and failing to take that action could be considered inappropriate depending upon the priorities you attach to different outcomes.

Any recent examples or possibilities that you can cite to explain ‘dangerous data’?

If you provide a photo-storage service to customers, you are probably aware that ‘some’ of those photos ‘could’ indicate evidence of criminal activity. You have no choice about storing customer photos (that is your business). But the question is whether by storing those photos you are exposing your organization to negative consequences. For example, if you scan the photos for evidence of illegal activity you are violating customer privacy. If you do not, you are failing to uphold the law and potentially supporting future criminal behavior. If you discover evidence of revolutionary activity, are you duty-bound to report that evidence in an evil dictatorship as you would be in a benevolent democracy? The key point of dangerous data is that you really don’t have a choice about collecting it, but there is also no automatic answer about what to do with it now that you have it.

Advertisment

How should companies navigate this Scylla and Charybdis situation between ‘should I tell—not Tell’? Especially when offering ‘privacy’ to users is a product feature in itself for many companies?

It is a strategic choice based on their values (and those of their employees, customers, or other stakeholders). They will be judged based on their behavior and people with similar values (or those who do not read the fine print of privacy statements) will support them and others will seek other options (similar to the situation Twitter is getting into).

Does this paradox of ‘knowing something dangerous’ become easier or tougher to handle when the world is swinging towards Blockchains and decentralization—with anonymity/privacy of data taking new contours?

Advertisment

That is one way to avoid ‘knowing’ the data, you can technologically put it out of your own reach as well (this is still a choice, not a random happenstance resulting from technological developments) – organizations implement blockchain for example because they do not want to take the responsibility of ‘owning’ the data. At a certain point, you also do need to ‘know’ the data to draw broader conclusions, I may not need to know how ‘your’ self-driving car is performing, but I do need to know how ‘some’ self-driving cars are performing in order to improve the self-driving algorithm.

The what if the customer or social media side would take more effort because the ‘right’ course of action—is more ambiguous. We can (more or less) agree on what is legal or illegal, but what is acceptable to some groups is unacceptable to others requiring more variability in how the data is handled.

Advertisment

Can you elaborate on the aspects of ‘plausible deniability’ and ‘treating the symptoms’ that were shared in the session?

Plausible deniability is the excuse that although you had the data, you either did not realize the implications of having the data (‘Sorry we deleted the data already and didn’t know we could have used it for that purpose’) or that someone else had the data and didn’t tell you (‘that data is actually stored on a 3rd party server, we just sell the rights to store the data there’).

Treating the symptoms is tactically responding to specific situations while avoiding the broader implications (‘I didn’t realize people were storing those sorts of photos on my service! Let me act straight away to delete those images and report that individual to the authorities, thank you so much for bringing it to my attention!’), ‘we (and everyone else in our industry) always do it that way and nobody (including the government/regulators) has ever had a problem with it)’.

Advertisment

I may not need to know how ‘your’ self-driving car is performing, but I do need to know how ‘some’ self-driving cars are performing in order to improve the self-driving algorithm.

Will the implications of dangerous data change between zero-party and third-party data?

The principle applies to both although there is another dimension for third-party data because of the potential lack of awareness about what data the organization possesses (potentially reducing the exposure to some aspects of dangerous data) but also a lack of explicit or implicit permission about the data the organization possesses (increasing the exposure to some implications of dangerous data).

Advertisment

Can data localization demands add more dilemmas for companies?

I think the biggest danger is that data localization reduces some of the legal exposure but does nothing to reduce the social implications so organizations can deceive themselves into thinking they are solving the problem.

AI built on dangerous data—how slippery and scary can that be? Especially when companies let it speak to the outside world as conversational AI.

You are correct, I think Gen-AI will make the data that organizations possess even more obvious, and the ease of enabling new use cases to which it can be put will open up a variety of debates about good and bad uses of data.

What should we be really worried about—as users, as companies, as innovators, as regulators—but we are not?

That as more and more of our daily activity is captured in data, the potential for both beneficial use and misuse of that data increases, but there are few incentives for organizations to use that data for good versus attempting to conceal the implications of the data that they have access to.

Would zero-knowledge proof models help to solve some of these dilemmas?

I love that you asked this question, but I think the answer is ‘unfortunately not’. The issue is not really the possession of the data, it is the use to which that data is put (or not) that is the issue. The solution (possibly naive) is for data management (and business schools) to include training classes around business ethics and for regulation along the lines of good Samaritan indemnification for taking action intended to provide assistance.

Written by Pratima H

The Morton’s fork of Dangerous Data: Can companies escape this new witch-hunt?

Do they know we know? What a Sophie’s choice! Holding a certain kind of data is like being an eyewitness to a murder.

The key point of dangerous data is that you really don’t have a choice about collecting it, but there is also no automatic answer about what to do with it now that you have it.

How would you explain ‘dangerous data’ to an ignorant CEO if you had to—in two or three lines?

Any recent examples or possibilities that you can cite to explain ‘dangerous data’?

How should companies navigate this Scylla and Charybdis situation between ‘should I tell—not Tell’? Especially when offering ‘privacy’ to users is a product feature in itself for many companies?

Does this paradox of ‘knowing something dangerous’ become easier or tougher to handle when the world is swinging towards Blockchains and decentralization—with anonymity/privacy of data taking new contours?

What area would take more effort or investments—the legal side, the social media side, or the ‘what if the customer knows we know’ side?

Can you elaborate on the aspects of ‘plausible deniability’ and ‘treating the symptoms’ that were shared in the session?

I may not need to know how ‘your’ self-driving car is performing, but I do need to know how ‘some’ self-driving cars are performing in order to improve the self-driving algorithm.

Will the implications of dangerous data change between zero-party and third-party data?

Can data localization demands add more dilemmas for companies?

What should we be really worried about—as users, as companies, as innovators, as regulators—but we are not?

Would zero-knowledge proof models help to solve some of these dilemmas?