METHODS: We propose to use 3D Axial-Attention, which requires a fraction of the computing power of a regular Non-Local network (i.e., self-attention). Unlike a regular Non-Local network, the 3D Axial-Attention network applies the attention operation to each axis separately. Additionally, we solve the invariant position problem of the Non-Local network by proposing to add 3D positional encoding to shared embeddings.
RESULTS: We validated the proposed method on 442 benign nodules and 406 malignant nodules, extracted from the public LIDC-IDRI dataset by following a rigorous experimental setup using only nodules annotated by at least three radiologists. Our results show that the 3D Axial-Attention model achieves state-of-the-art performance on all evaluation metrics, including AUC and Accuracy.
CONCLUSIONS: The proposed model provides full 3D attention, whereby every element (i.e., pixel) in the 3D volume space attends to every other element in the nodule effectively. Thus, the 3D Axial-Attention network can be used in all layers without the need for local filters. The experimental results show the importance of full 3D attention for classifying lung nodules.