GDPR panic may spur data and AI innovation

INSUBCONTINENT EXCLUSIVE:
If AI innovation runs on data, the new European Union General Data Protection Regulations (GDPR) seem poised to freeze AI advancement
The regulations prescribe a utopian data future where consumers can refuse companies access to their personally identifiable information
(PII)
Although the enforcement deadline has passed, the technical infrastructure and manpower needed to meet these requirements still do not exist
in most companies today. Coincidentally, the barriers to GDPR compliance are also bottlenecks of widespread AI adoption
Despite the hype, enterprise AI is still nascent: Companies may own petabytes of data that can be used for AI, but fully digitizing that
data, knowing what the data tables actually contain and understanding who, where and how to access that data remains a herculean
coordination effort for even the most empowered internal champion
It no wonder that many scrappy AI startups find themselves bogged down by customer data cleanup and custom integrations. As multinationals
and Big Tech overhaul their data management processes and tech stack to comply with GDPR, here how AI and data innovation counterintuitively
also stand to benefit. How GDPR impacts AI GDPR covers the collection, processing and movement of data that can be used to identify a
person, such as a name, email address, bank account information, social media posts, health information and more, all of which are currently
used to power the AI algorithms ranging from targeting ads to identifying terrorist cells. The penalty for noncompliance is 4 percent of
global revenue, or €20 million, whichever is higher
To put that in perspective: 4 percent of Amazon 2017 revenue is $7.2 billion, Google is $4.4 billion and Facebook is $1.6 billion
These regulations apply to any citizen of the EU, no matter their current residence, as well as vendors upstream and downstream of the
companies that collect PII. Article 22 of the GDPR, titled &Automated Individual Decision-making, including Profiling,& prescribes that AI
cannot be used as the sole decision-maker in choices that have legal or similarly significant effects on users
In practice, this means an AI model cannot be the only step for deciding whether a borrower can receive a loan; the customer must be able to
request that a human review the application. One way to avoid the cost of compliance, which includes hiring a data protection officer and
building access controls, is to stop collecting data on EU residents altogether
This would bring PII-dependent AI innovation in the EU to a grinding halt
With the EU representing about 16 percent of global GDP, 11 percent of global online advertising spend and 9 percent of the global
population in 2017, however, Big Tech will more likely invest heavily in solutions that will allow them to continue operating in this
market. Transparency mandates force better data accessibility GDPR mandates that companies collecting consumer data must enable individuals
to know what data is being collected about them, understand how it is being used, revoke permission to use specific data, correct or update
data and obtain proof that the data has been erased if the customer requests it
To meet these potential requests, companies must shift from indiscriminately collecting data in a piecemeal and decentralized manner to
establishing an organized process with a clear chain of control. Any data that companies collect must be immediately classified as either
PII or de-identified and assigned the correct level of protection
Its location in the company databases must be traceable with an auditable trail: GDPR mandates that organizations handling PII must be able
to find all copies of regulated data, regardless of how and where it is stored
These organizations will need to assign someone to manage their data infrastructure and fulfill these user privacy requests. Unproven
upside alone has always been insufficient to motivate cross-functional modernization. Having these data infrastructure and
management processes in place will greatly lower the company barriers to deploying AI
By fully understanding their data assets, the company can plan strategically about where they can deploy AI in the near-term using their
existing data assets
Moreover, once they build an AI road map, the company can determine where they need to obtain additional data to build more complex and
valuable AI algorithms
With the data streams simplified, storage mapped out and a chain of ownership established, the company can more effectively engage with AI
vendors to deploy their solutions enterprise-wide. More importantly, GDPR will force many companies dragging their feet on digitization to
finally bite the bullet
The mandates require that data be portable: Companies must provide a way for users to download all of the data collected about them in a
standard format
Currently, only 10 percent of all data is collected in a format for easing analysis and sharing, and more than 80 percent of enterprise data
today is unstructured, according to Gartner estimates. Much of this structuring and information extraction will initially have to be done
manually, but Big Tech companies and many startups are developing tools to accelerate this process
According to PWC, the sectors most behind on digitization are healthcare, government and hospitality, all of which handle large amounts of
unstructured data containing PII — we could expect to see a flood of AI innovation in these categories as the data become easier to access
and use. Consumer opt-outs require more granular AI model management Under GDPR guidelines, companies must let users prevent the company
from storing certain information about them
If the user requests that the company permanently and completely delete all the data about them, the company must comply and show proof of
deletion
How this mandate might apply to an AI algorithm trained on data that a user wants to delete is not specifically prescribed and awaits its
first test case. Today, data is pooled together to train an AI algorithm
It is unclear how an AI engineer would attribute the impact of a particular data point to the overall performance of the algorithm
If the enforcers of GDPR decide that the company must erase the effect of a unit of data on the AI model in addition to deleting the data,
companies using AI must find ways to granularly explain how a model works and fine tune the model to &forget& that data in question
Many AI models are black boxes today, and leading AI researchers are working to enable model explainability and tunability
The GDPR deletion mandate could accelerate progress in these areas. In this post-GDPR future, companies no longer have to infer
intent from expensive schemes to sneakily capture customer information. In the nearer term, these GDPR mandates could shape best
practices for UX and AI model design
Today, GDPR-compliant companies offer users the binary choice of allowing full, effectively unrestricted use of their data or no access at
all
In the future, product designers may want to build more granular data access permissions. For example, before choosing to delete Facebook
altogether, a user can refuse companies access to specific sets of information, such as their network of friends or their location data
AI engineers anticipating the need to trace the effect of specific data on a model may choose to build a series of simple models optimizing
on single dimensions, instead of one monolithic and very complex model
This approach may have performance trade-offs, but would make model management more tractable. Building trust for more data tomorrow The new
regulations require companies to protect PII with a level of security previously limited to patient health and consumer finance data
Nearly half of all companies recently surveyed by Experian about GDPR are adopting technology to detect and report data breaches as soon as
they occur
As companies adopt more sophisticated data infrastructure, they will be able to determine who has and should have access to each data stream
and manage permissions accordingly
Moreover, the company may also choose to build tools that immediately notify users if their information was accessed by an unauthorized
party; Facebook offers a similar service to its employees, called a &Sauron alert.& Although the restrictions may appear to reduce tech
companies& ability to access data in the short-term, 61 percent of companies see additional benefits of GDPR-readiness beyond penalty
avoidance, according to a recent Deloitte report
Taking these precautions to earn customer trust may eventually lower the cost of acquiring high-quality, highly dimensional data. In this
post-GDPR future, companies no longer have to infer intent from expensive schemes to sneakily capture customer information
Improved data infrastructure will have enabled early AI applications to demonstrate their value, encouraging more customers to voluntarily
share even more information about themselves to trustworthy companies. Unproven upside alone has always been insufficient to motivate
cross-functional modernization, but the threat of a multi-billion-dollar penalty may finally spur these companies to action
More importantly, GDPR is but the first of much more data privacy regulation to come, and many countries across the world look to it as a
model for their own upcoming policies
As companies worldwide lay the groundwork for compliance and transparency, they&re also paving the way to an even more vibrant AI future to
come.