Artificial intelligence (AI) and dermatology seem like a match made in heaven – or at least a match made in a utopian, high-tech lab. The end goal is clear: the widespread use of transformative AI applications capable of diagnosing myriad skin conditions with pinpoint accuracy. But, for AI systems to have a wholly positive impact on the field of dermatology, it is essential that standards for development and evaluation be adopted to protect patients from algorithmic harms.
Enter the International Skin Imaging Collaboration (ISIC) AI working group. Its 19 expert members undertook a multi-stage process to identify the most relevant criteria for ensuring the implementation of safe and unbiased AI systems. Scanning through 13 years of PubMed articles, the team unearthed factors of interest, which they refined into specific recommendations and subjected to two rounds of review.
The final result was a checklist of 25 carefully selected recommendations, ranging from documentation requirements to best practices for bias testing. These recommendations are summarized below:
Recommendation: Document your data
An undocumented data set is an unsafe data set. The ISIC AI working group stressed the necessity of documenting all aspects of a system’s training, validation, and testing data sets to maximize transparency and minimize covert harm. Documentation should describe, among other factors: image anomalies, patient-level metadata, collection criteria, and any classes (e.g. types of diseases) that were excluded from the training data.
Recommendation: Describe development
Documentation habits should extend beyond data to the AI system itself. The development process should be described, preferably in a way that encourages replication. Methods for labeling images should likewise be recorded.
Recommendation: Assess your algorithm
When it comes to risky AI applications, there are few more dangerous than a system that over-diagnoses, or worse, fails to diagnose, life-threatening diseases. Developers should evaluate their systems using appropriate (and documented) performance metrics and should compare that performance to top-of-the-line algorithms, as well as to human experts. Independent testing should also be facilitated via open-source code or a public-facing test interface.
Recommendation: Examine effects
Finally, it is crucial that developers situate their AI system in its wider context, cataloguing its use case, users, and setting. Its potential impact should be well-understood, from its performance on its intended user base to any unintended harms perpetuated against vulnerable groups.
The checklist spans the entirety of the development process, providing key advice on data generation, system design, and testing practices, making the ISIC’s recommendations an excellent starting point for anyone committed to building and applying ethical AI systems to dermatological tasks.