DeNVeR:
Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
- Chun-Hung Wu 1
- Shih-Hong Chen 1
- Chih-Yao Hu 2
- Hsin-Yu Wu 1
- Kai-Hsin Chen 1
- Yu-You Chen 1
- Chih-Hai Su 1
- Chih-Kuo Lee 2
- Yu-Lun Liu 1 1National Yang Ming Chiao Tung University 2National Taiwan University
Abstract
This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronary video dataset with high-quality, manually labeled segmentation ground truth. Our evaluation demonstrates that DeNVeR outperforms current state-of-the-art methods in vessel segmentation. This paper marks an advance in medical imaging, providing a robust, data-efficient tool for disease diagnosis and treatment planning and setting a new standard for future research in video vessel segmentation.
Vessel segmentation methods
Existing self-supervised methods such as SSVS, DARL, and FreeCOS require extensive X-ray images for training, which limits their ability to generalize to new data. Our method overcomes this limitation by leveraging unsupervised test-time training directly on testing videos, eliminating the need for a large annotated dataset. Our approach demonstrates superior segmentation accuracy with finer and more consistent vessel contours, showcasing its robust generalization capabilities with minimal training data.
DeNVeR pipeline
In the preprocessing phase (a), we apply a Hessian-based technique complemented by region growing for frame-specific vessel segmentation. Subsequently, in (b) Stage 1, Multi-Layer Perceptrons (MLPs) are employed to model both the background deformation fields and a canonical background image, establishing a baseline devoid of vessel structures via a reconstruction loss criterion. Finally, in (c) Stage 2, the canonical background is held constant while we refine the canonical foreground vessel image, per-frame vessel masks, and their respective motions. This involves the utilization of B-spline parameters to capture vessel and background movement, followed by a warping process that merges the foreground and background layers to reconstruct the frames. The reconstruction loss is minimized to ensure fidelity to the original input frames. This entire pipeline is trained directly on test videos without the need for ground truth segmentation masks.
Eulerian motion field modeling
We model the background heartbeat motion using a B-spline with a lower degree of freedom, while the foreground vessel flow is modeled with a stationary Eulerian motion field that does not vary with time. However, the observed vessel motions from the X-ray videos are from both these factors. Therefore, we obtain the final vessel flow by warping the Eulerian motion with the background flow and then adding the background motion to it.
Parallel vessel motion loss
The vessel's flow direction should align with the travel direction of the vessel mask, leading us to design a parallel vessel motion loss. We apply skeletonization and distance transform to the preprocessed vessel mask to determine the gradient direction at each location. The predicted vessel motion must be perpendicular to these gradients, as shown by the blue arrows.
Quantitative evaluation
Category | Input | Method | clDice | NSD | Jaccard | Dice | Acc. | Sn. | Sp. |
---|---|---|---|---|---|---|---|---|---|
T | Image | Hessian | 0.577 | 0.321 | 0.415 | 0.584 | 0.929 | 0.451 | 0.990 | SS | Image | SSVS [ICCV 2021] | 0.408 | 0.216 | 0.355 | 0.522 | 0.905 | 0.471 | 0.960 |
DARL [ICLR 2023] | 0.605 | 0.300 | 0.464 | 0.631 | 0.929 | 0.547 | 0.978 | ||
FreeCOS [ICCV 2023] | 0.639 | 0.461 | 0.506 | 0.660 | 0.941 | 0.554 | 0.988 | ||
U | Video | DeNVeR (Ours) | 0.704 | 0.515 | 0.584 | 0.733 | 0.947 | 0.656 | 0.985 |
Visual results
Citation
Acknowledgements
This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.
The website template was borrowed from Michaël Gharbi and Ref-NeRF.