Anales de la RANM

272 A N A L E S R A N M R E V I S T A F U N D A D A E N 1 8 7 9 DEEP LEARNING GENITAL LESIONS IMAGE CLASSIFICATION González-Alday R, et al. An RANM. 2022;139(03): 266 - 273 4. DISCUSSION 4.1. Model’s results and performance The overall accuracy is highly positive, although we must comment various conditions. In particular, the model classifies correctly all the instances of condylomas, the most represented class in the dataset, but has some errors in the classification of herpes and warts. This particular result is a direct consequence of the relatively low number of examples in the used image dataset. On the one hand, the produced GradCam explanations (figure 3) for correctly classified instances confirm that the network’s attention is focused correctly, as higher gradients are located over the lesions. On the other hand, for wrongly classified instances, the heatmaps can also provide some interesting insights. In some cases, some lesions such as warts and condylomas can be confused due to their similar appearance (in fact, condylomas and warts essentially differ in size, and some herpes vesicles can be easily mistaken for wart-like lumps), and therefore the explanations seem reasonable. But in other cases —the most common among wrongly classified instances—, the error is the result of an erroneous focus process of the network, which can be detected in heatmaps, when the network’s gradients are not focused on the lesions (see figure 3). These results corroborate that, most likely, if more images were available, especially for the herpes and wart classes (that only had around 30 to 40 images, which is very few for a deep learning model), the model should improve the learning process for classi- fication and then the model’s performance would be significantly better. We consider that the number of images of the available dataset was sufficient for the proof-of- concept presented in this article, suggesting that Deep Learning can be a successful approach for early diagnosis and prevention in these STDs, particularly for places where specialists are scarce. We can expect that a larger dataset could improve robustness, performance and generalization of the trained model, with a more balanced distri- bution between classes and adding other kinds of lesions such as ulcers. The usually large datasets needed for many Deep Learning applications is a major limiting factor, particularly in medicine. In addition, external datasets from different hospitals and countries should be required for a comprehen- sive evaluation before a tool of this kind could be definitively introduced into clinical practice. In developing countries —e.g., in subsaharian Africa— the deployment of AI-based tools might be a dramatic challenge to improve medical care in places where specialists and advanced technology are scarce. It should be noted also the promise of including explainability methods —such as GradCam in this case—, in these AI-based tools. While explainability has been a major limitation of most AI systems in medicine already since the 1970s, recent advances might facilitate an improved, detailed analysis of the model, facilitating to identify when and how the system fails, instead of behaving as, for instance, the classical black-box characteristic of machine learning systems. Such a feature will be finally necessary to ensure the acceptance of medical professionals. 5.  CONCLUSIONS Artificial Intelligence and deep learning models in particular have a great potential to improve and transform healthcare, as well as to bring it closer to places where medical expertise is difficult to access. Under this scope, the work presented in this paper shows a promising starting point to implement a CNN for assisting the diagnosis of genital lesions caused by STDs. The presented prototype aimed to show various alternatives already available in AI-based medical imaging applications. The presented system shows such possibilities with a limited image dataset with a few common and well-differentiated genitall lesions. These performance results are encouraging and confirm the feasibility and promise of developing a robust application using the same methods while relying on a larger variety of images. The explainability method included in this prototype shows the potential capabilities of these tools. While explainability has been a major limita- tion of most AI systems in medicine already since the 1970s, recent advances facilitates the detailed analysis of the model, allowing to identify when and how the system fails, instead of behaving as the classical black-box characteristic of machine learning systems. This way, the trustworthiness of the model, regarding both its development and application, can be notably increased and might facilitate their acceptance by health professionals, which has been a traditional drawback of many AI-based systems. ACKNOWLEDGEMENTS STD Images were provided by Dr. Francois Peinado, urologist —a coauthor of this manuscript— and Dr. Álvaro Vives, urologist at Fundación Puigvert, Barcelona, as part of a joint collaboration for a future enhanced development. This work was partially supported by the Proyecto colaborativo de integracion de datos genomicos (CICLOGEN) (No. PI17/01561), funded by the Carlos III Health Institute from the Spanish National Plan for Scientific and Technical Research and Innovation 2017-2020 and the European Regional Develop- ment Fund (FEDER).

RkJQdWJsaXNoZXIy ODI4MTE=