Abstract:
|
Providing force feedback as a feature in current Robot-Assisted Minimally Invasive Surgery systems still remains a challenge. In recent years, Vision-Based Force Sensing (VBFS) has emerged as a promising approach to address this problem. Existing methods have been developed in a Supervised Learning (SL) setting. Nonetheless, most of the video sequences related to robotic surgery are not provided with ground-truth force data, which can be easily acquired in a controlled environment. A powerful approach to process unlabeled video sequences and find a compact representation for each video frame relies on using an Unsupervised Learning (UL) method. Afterward, a model trained in an SL setting can take advantage of the available ground-truth force data. In the present work, UL and SL techniques are used to investigate a model in a Semi-Supervised Learning (SSL) framework, consisting of an encoder network and a Long-Short Term Memory (LSTM) network. First, a Convolutional Auto-Encoder (CAE) is trained to learn a compact representation for each RGB frame in a video sequence. To facilitate the reconstruction of high and low frequencies found in images, this CAE is optimized using an adversarial framework and a L1-loss, respectively. Thereafter, the encoder network of the CAE is serially connected with an LSTM network and trained jointly to minimize the difference between ground-truth and estimated force data. Datasets addressing the force estimation task are scarce. Therefore, the experiments have been validated in a custom dataset. The results suggest that the proposed approach is promising. |