AI chatbot maker Babylon Health attacks clinician in PR stunt after he goes public with safety concerns

INSUBCONTINENT EXCLUSIVE:
UK startup Babylon Health pulled app data on a critical user in order to create a press release in which it publicly attacks the UK doctor
who has spent years raising patient safety concerns about the symptom triage chatbot service. In the press release released late Monday
Babylon refers to Dr David Watkins — via his Twitter handle — as a &troll& and claims he &targeted members of our staff, partners,
clients, regulators and journalists and tweeted defamatory content about us&. It also writes that Watkins has clocked up &hundreds of hours&
and 2,400 tests of its service in a bid to discredit his safety concerns — saying he raised &fewer than 100 test results which he
considered concerning&
Babylon PR also claims that only in 20 instances did Watkins find &genuine errors in our AI&, whereas other instances are couched as
‘misrepresentations& or &mistakes&, per an unnamed &panel of senior clinicians& which the startup PR says &investigated and re-validated
every single one& — suggesting the error rate Watkins identified was just 0.8%. Screengrab from Babylon press release which refers to to
Dr Watkins& &Twitter troll tests& Responding to the attack in a telephone interview with TechCrunch Watkins described Babylon claims as
&absolute nonsense& — saying, for example, he has not carried out anywhere near 2,400 tests of its service
&There are certainly not 2,400 completed triage assessments,& he told us
&Absolutely not.& Asked how many tests he thinks he did complete Watkins suggested it likely to be between 800 and 900 full runs through
&complete triages& (some of which, he points out, would have been repeat tests to see if the company had fixed issues he&d previously
noticed). He said he identified issues in about one in two or one in three instances of testing the bot — though in 2018 says he was
finding far more problems, claiming it was &one in one& at that stage for an earlier version of the app. Watkins suggests that to get to the
2,400 figure Babylon is likely counting instances where he was unable to complete a full triage because the service was lagging or glitchy
&They&ve manipulated data to try and discredit someone raising patient safety concerns,& he said. &I obviously test in a fashion which is
[that] I know what I&m looking for — because I&ve done this for the past three years and I&m looking for the same issues which I&ve
flagged before to see have they fixed them
So trying to suggest that my testing is actually any indication of the chatbot is absurd in itself,& he added. In another pointed attack
Babylon writes Watkins has &posted over 6,000 misleading attacks& — without specifying exactly what kind of attacks it referring to (or
where they&ve been posted). Watkins told us he hasn&t even tweeted 6,000 times in total since joining Twitter four years ago — though he
has spent three years using the platform to raise concerns about diagnosis issues with Babylon chatbot. Such as this series of tweets where
he shows a triage for a female patient failing to pick up a potential heart attack. The @babylonhealth Chatbot has descended to a whole
new level of incompetence, with #DeathByChatbot #GenderBias. Classic #HeartAttack symptoms in a FEMALE, results in a diagnosis of
#PanicAttack or #Depression. The Chatbot ONLY suggests the possibility of a #HeartAttack in MEN! pic.twitter.com/M8ohPDx0LX — Dr Murphy
(aka David Watkins) (@DrMurphy11) September 8, 2019 Watkins told us he has no idea what the 6,000 figure refers to, and accuses Babylon of
having a culture of &trying to silence criticism& rather than engage with genuine clinician concerns. &Not once have Babylon actually
approached me and said ‘hey Dr Murphy — or Dr Watkins — what you&ve tweeted there is misleading&,& he added
&Not once.& Instead, he said the startup has consistently taken a &dismissive approach& to the safety concerns he raised
&My overall concern with the way that they&ve approached this is that yet again they have taken a dismissive approach to criticism and again
tried to smear and discredit the person raising concerns,& he said. Watkins, a consultant oncologist at The Royal Marsden NHS Foundation
Trust — who has for several years gone by the online (Twitter) moniker of @DrMurphy11, tweeting videos of Babylon chatbot triage he says
illustrate the bot failing to correctly identify patient presentations — made his identity public on Monday when he attended a debate at
the Royal Society of Medicine. Dr Murphy unmasked
Now for his positional statement
His driving force & patient safety
Can&t argue with that!! @DrMurphy11 #RSMDigiHealth @RoySocMed pic.twitter.com/hOC7kzlNz3 — clive flashman (@cflashman) February 24,
2020 There he gave a presentation calling for less hype and more independent verification of claims being made by Babylon as such digital
systems continue elbowing their way into the healthcare space. In the case of Babylon, the app has a major cheerleader in the current UK
Secretary of State for health, Matt Hancock, who has revealed he a personal user of the app. Simultaneously Hancock is pushing the National
Health Service to overhaul its infrastructure to enable the plugging in of &healthtech& apps and services
So you can spot the political synergies. Watkins argues the sector needs more of a focus on robust evidence gathering and independent
testing vs mindless ministerial support and partnership ‘endorsements& as a stand in for due diligence. He points to the example of
Theranos — the disgraced blood testing startup whose co-founder is now facing charges of fraud — saying this should provide a major red
flag of the need for independent testing of ‘novel& health product claims. &[Over hyping of products] is a tech industry issue which
unfortunately seems to have infected healthcare in a couple of situations,& he told us, referring to the startup ‘fake it til you make it&
playbook of hype marketing and scaling without waiting for external verification of heavily marketed claims. In the case of Babylon, he
argues the company has failed to back up puffy marketing with evidence of the sort of extensive clinical testing and validation which he
says should be necessary for a health app that out in the wild being used by patients
(References to academic studies have not been stood up by providing outsiders with access to data so they can verify its claims, he also
says.) &They&ve got backing from all these people — the founders of Google DeepMind, Bupa, Samsung, Tencent, the Saudis have given them
hundreds of millions and they&re a billion dollar company
They&ve got the backing of Matt Hancock
Got a deal with Wolverhampton
It all looks trustworthy,& Watkins went on
&But there is no basis for that trustworthiness
You&re basing the trustworthiness on the ability of a company to partner
And you&re making the assumption that those partners have undertaken due diligence.& For its part Babylon claims the opposite — saying its
app meets existing regulatory standards and pointing to high &patient satisfaction ratings& and a lack of reported harm by users as evidence
of safety, writing in the same PR in which it lays into Watkins: Our track record speaks for itself: our AI has been used millions of times,
and not one single patient has reported any harm (a far better safety record than any other health consultation in the world)
Our technology meets robust regulatory standards across five different countries, and has been validated as a safe service by the NHS on ten
different occasions
In fact, when the NHS reviewed our symptom checker, Healthcheck and clinical portal, they said our method for validating them &has been
completed using a robust assessment methodology to a high standard.& Patient satisfaction ratings see over 85% of our patients giving us 5
stars (and 94% giving five and four stars), and the Care Quality Commission recently rated us &Outstanding& for our leadership. But
proposing to judge the efficacy of a health-related service by a patient ability to complain if something goes wrong seems, at the very
least, an unorthodox approach — flipping the Hippocratic oath principle of ‘first do no harm& on its head
(Plus, speaking theoretically, someone who dead would literally be unable to complain — which could plug a rather large loophole in any
‘safety bar& being claimed via such an assessment methodology.) On the regulatory point, Watkins argues that the current UK regime is not
set up to respond intelligently to a development like AI chatbots and lacks strong enforcement in this new category. Complaints he filed
with the MHRA (Medical and Healthcare products Regulatory Agency) have resulted in it asking Babylon to work on issues, with little or no
follow up, he says. While he notes that confidentiality clauses limit what can be disclosed by the regulator. All of that might look like a
plum opportunity for a certain kind of startup ‘disruptor&, of course. And Babylon app is one of several now applying AI type technologies
as a diagnostic aid in chatbot form, across several global markets
Users are typically asked to respond to questions about their symptoms and at the end of the triage process get information on what might be
a possible cause
Though Babylon PR materials are careful to include a footnote where it caveats that its AI tools &do not provide a medical diagnosis, nor
are they a substitute for a doctor&. Yet, says Watkins, if you read certain headlines and claims made for the company product in the media
you might be forgiven for coming away with a very different impression — and it this level of hype that has him worried. Other less
hype-dispensing chatbots are available, he suggests — name-checking Berlin-based Ada Health as taking a more thoughtful approach on that
front. Asked whether there are specific tests he would like to see Babylon do to stand up its hype, Watkins told us: &The starting point is
getting a technology which you feel is safe to actually be in the public domain.& Notably, the European Commission is working on risk-based
regulatory framework for AI applications — including for use-cases in sectors such as healthcare — which would require such systems to
be &transparent, traceable and guarantee human oversight&, as well as to use unbiased data for training their AI models. &Because of the
hyperbolic claims that have been put out there previously about Babylon that where there a big issue
How do they now roll back and make this safe? You can do that by putting in certain warnings with regards to what this should be used for,&
said Watkins, raising concerns about the wording used in the app.&Because it presents itself as giving patients diagnosis and it suggests
what they should do for them to come out with this disclaimer saying this isn&t giving you any healthcare information, it just information
— it doesn&t make sense
I don&t know what a patient meant to think of that.& &Babylon always present themselves as very patient-facing, very patient-focused, we
listen to patients, we hear their feedback
If I was a patient and I&ve got a chatbot telling me what to do and giving me a suggested diagnosis — at the same time it telling me
‘ignore this, don&t use it& — what is it?& he added
&What its purpose? &There are other chatbots which I think have defined that far more clearly — where they are very clear in their intent
saying we&re not here to provide you with healthcare advice; we will provide you with information which you can take to your healthcare
provider to allow you to have a more informed decision discussion with them
And when you put it in that context, as a patient I think that makes perfect sense
This machine is going to give me information so I can have a more informed discussion with my doctor
Fantastic
So there simple things which they just haven&t done
And it drives me nuts
I&m an oncologist — it shouldn&t be me doing this.& Watkins suggested Babylon response to his raising &good faith& patient safety concerns
is symptomatic of a deeper malaise within the culture of the company
It has also had a negative impact on him — making him into a target for parts of the rightwing media. &What they have done, although it
may not be users& health data, they have attempted to utilize data to intimidate an identifiable individual,& he said of the company attack
him.&As a consequence of them having this threatening approach and attempting to intimidate other parties have though let bundle in and
attack this guy
So it that which is the harm which comes from it
They&ve singled out an individual as someone to attack.& &I&m concerned that there clinicians in that company who, if they see this
happening, they&re not going to raise concerns — because you&ll just get discredited in the organization
And that really dangerous in healthcare,& Watkins added
&You have to be able to speak up when you see concerns because otherwise patients are at risk of harm and things don&t change
You have to learn from error when you see it
You can&t just carry on doing the same thing again and again and again.& Others in the medical community have been quick to criticize
Babylon for targeting Watkins in such a personal manner and for revealing details about his use of its (medical) service. As one Twitter
user, Sam Gallivan — also a doctor — put it: &Can other high frequency Babylon Health users look forward to having their medical queries
broadcast in a press release?& Can other high frequency @babylonhealth users look forward to having their private medical queries
broadcast in a press release? — Sam Gallivan (@samgal) February 25, 2020 The act certainly raises questions about Babylon approach to
sensitive health data, if it accessing patient information for the purpose of trying to steamroller informed criticism. We&ve seen similarly
ugly stuff in tech before, of course — such as when Uber kept a ‘god-view& of its ride-hailing service and used it to keep tabs on
critical journalists
In that case the misuse of platform data pointed to a toxic culture problem that Uber has had to spend subsequent years sweating to turn
around (including changing its CEO). Babylon selective data dump on Watkins is also an illustrative example of a digital service ability to
access and shape individual data at will — pointing to the underlining power asymmetries between these data-capturing technology platforms
(which are gaining increasing agency over our decisions) and their users who only get highly mediated, hyper controlled access to the
databases they help to feed. Watkins, for example, told us he is no longer able to access his query history in the Babylon app — providing
a screenshot of an error screen (below) that he says he now sees when he tries to access chat history in the app.He said he does not know
why he is no longer able to access his historical usage information but says he was using it as a reference — to help with further testing
(and no longer can). If it a bug it a convenient one for Babylon PR… We contacted Babylon to ask it to respond to criticism of its
attack on Watkins
The company defended its use of his app data to generate the press release — arguing that the &volume& of queries he had run means the
usual data protection rules don&t apply, and further claiming it had only shared &non-personal statistical data&, even though this was
attached in the PR to his Twitter identity (and therefore, since Monday, to his real name). In a statement the Babylon spokesperson told
us: If safety related claims are made about our technology, our medical professionals are required to look into these matters to ensure the
accuracy and safety of our products
In the case of the recent use data that was shared publicly, it is clear given the volume of use that this was theoretical data (forming
part of an accuracy test and experiment) rather than a genuine health concern from a patient
Given the use volume and the way data was presented publicly, we felt that we needed to address accuracy and use information to reassure our
users
The data shared by us was non-personal statistical data, and Babylon has complied with its data protection obligations throughout
Babylon does not publish genuine individualised user health data. We also asked the UK data protection watchdog about the episode and
Babylon making Watkins& app usage public
The ICO told us: &People have the right to expect that organisations will handle their personal information responsibly and securely
If anyone is concerned about how their data has been handled, they can contact the ICO and we will look into the details.& Babylon clinical
innovation director, Dr Keith Grimes, attended the same Royal Society debate as Watkins this week — which was entitled Recent developments
in AI and digital health 2020 and billed as a conference that will &cut through the hype around AI&. So it looks to be no accident that
their attack press release was timed to follow hard on the heels of a presentation it would have known (since at least last December) was
coming that day — and in which Watkins argued where AI chatbots are concerned &validation is more important than valuation&. A little
challenge to one of our critics…#RSMDigiHealth https://t.co/XqvQpRYMLX — Babylon (@babylonhealth) February 24, 2020 Last summer Babylon
announced a $550M Series C raise, at a $2BN+ valuation. Investors in the company include Saudi Arabia Public Investment Fund, an unnamed
U.S.-based health insurance company, Munich Re ERGO Fund, Kinnevik, Vostok New Ventures and DeepMind co-founder Demis Hassabis, to name a
few helping to fund its marketing. &They came with a narrative,& said Watkins of Babylon message to the Royal Society
&The debate wasn&t particularly instructive or constructive
And I say that purely because Babylon came with a narrative and they were going to stick to that
The narrative was to avoid any discussion about any safety concerns or the fact that there were problems and just describe it as safe.& The
clinician counter message to the event was to pose a question EU policymakers are just starting to consider — calling for the AI maker to
show data-sets that stand up its safety claims. Europe sets out plan to boost data reuse and regulate ‘high risk& AIs