BACKGROUND AND AIMS: Artificial Intelligence (AI) has been shown to be effective in polyp detection, and multiple computer-aided detection (CADe) system have been developed. False positive (FP) activation emerged as a possible way to benchmark CADe performances in clinical practice. The aim of this study is to validate a previously developed classification of FP comparing the performances of different brands of approved CADe systems.
METHODS: We compared 2 different consecutive video libraries (40 video per arm) collected at Humanitas Research Hospital with 2 different CADe system brands (CADe A and CADe B). For each video, the number of CADe false activations, the cause and the time spent by the endoscopist to examine the area erroneously highlighted were reported. The FP activations were classified according to the previously developed classification of false positives (the NOISE classification) according to their cause and relevance.
RESULTS: A total of 1021 FP activations were registered across the 40 videos of the Group A (25.5±12.2 FPs per colonoscopy). A comparable number of FPs were identified in the Group B (n=1028, mean:25.7±13.2 FPs per colonoscopy) (p 0.53). Among them, 22.9±9.9 (89.8%, Group A), and 22.1±10.0 (86.0%, Group B) were due to artifacts from bowel wall. Conversely, 2.6±1.9 (10.2%) and 3.5±2.1 (14%) were caused by bowel content (p 0.45). Within the Group A each false activation required 0.2±0.9 seconds, with 1.6±1.0 (6.3%) FPs requiring additional time for endoscopic assessment. Comparable results were reported within the Group B with 0.2±0.8 seconds spent per false activation and 1.8±1.2 FPs per colonoscopy requiring additional inspection.
CONCLUSION: The use of a standardized nomenclature permitted to provide comparable results with either of the 2 recently approved CADe systems.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.