Measurement of surgeon cognitive workload using a multimodal sensor platform – A pilot study

3. Studies to date

Results presented here can also be found in our paper: https://arxiv.org/abs/2209.06208

Here, we describe in brief the preliminary studies that have been undertaken during Phase 1 project or are in progress which utilise the MAESTRO platform alongside some of the preliminary results to date.

3.1 Study 1: Measurement of surgeon cognitive workload using a multimodal sensor platform – A pilot study

Operation Room (OR) is a complex socio-technical environment where activities are usually different from other typical work environments, particularly in terms of their construction and working conditions. Working in the OR usually exposes clinicians and surgeons to multiple psycho-organizational constraints that can cause them negative repercussions on their health and on their performance at work [1,2].

Cognitive Workload (CWL) is defined as a contemporary scientific term used to describe the phenomenon of working memory use, and in particular potential stressors that can compromise their performance in the OR. The list includes team communication, noisiness, simultaneous processing of visual and auditory information as well as surgeon-related factors such stress and fatigue [3]. The sensory memory system can process big amounts of visual and auditory information, working memory in humans has a limited capacity to process this information simultaneously. It is well known that experienced surgeons are more capable of dealing with elevated levels of CWL that novices, especially to balance operative task demands against the available cognitive resources [1-3]. However, on occasions surgeons and clinicians experience a state of cognitive overload, in those that involve complex and/or non-routine operative situations such as emergencies and unexpected events [5-7]. 

3.1.1 Aims

The aim of this study consists of two main steps. First the Developoment of a network of synchronised wearable sensors to measure the cognitive workload of the surgeon with real time capabilities called MAESTRO as illustrated in Figure 1.2. Secondly, MAESTRO is an in-situ monitoring platform that aims at providing a near-to-real-time decision making support to surgeons to response according to their level of CWL. Machine learning approaches include deep learning, transfer learning, extreme learning machine, and traditional approaches such as support vector machines (SVMs), neural networks (NNs) and random forests (RFs).

Figure 3.1. (a) Peg Transfer task, and (b) Surgical Setup and wearable sensors and Laparoscopic Peg Transfer under four different conditions following a random order. 

3.1.2 Methods

As presented in Figure 3.1, a group of five surgical trainees performed a laparoscopic task (peg transfer) under four various conditions. Each participant completed the surgical task under the following four conditions in a randomised order:

  1. A control with no distractions (CS)
  2. Mental arithmetic during the task (MA) which comprised of subtracting serial sevens from one thousand
  3. Noise from a recurring hospital bleep (ND)
  4. Dual distractions comprising of conditions 2 and 3 simultaneously (DD).

In this research project, a systematic approach based on a two-stage Multimodal Deep learning strategy and transfer learning is suggested for the classification of surgical activities associated with the Cognitive Workload (CWL) during surgical operations in a pilot study related to Laparoscopic peg transfer. The proposed study consists of a Laparoscopic peg transfer task undertaken with 4 different conditions and designed in a way to increase CWL in randomised order as illustrated in Fig. 3.1 [7-9].

Figure 3.3 (a) Multimodal strategy for the classification of each task associated to the CWL using EEG, fNIRS and Pupil Data, and (b) classification of each task using ECG signals.

Figure 3.4 Convolutional Neural Network (AlexNet) used for Transfer Learning and applied to Binary Classification.

3.1.2.1 Data Fusion modelling of EEG, fNIRS and Pupil Eye Data based on a two-step Machine Learning Approach

In this section, a machine learning approach for the prediction and classification of CWL is suggested. As described in Figure 3.3(a), the first ML approach consists of two steps. In the initial step, a binary classification mechanism based on the concept of transfer learning is implemented for the recognition of the presence of CWL or not during a surgical procedure. If any CWL is detected in the initial step, this information is fed onto a second ML approach based on the model of 1D-Convolutional Neural Network for the classification of the different tasks performed in each surgical task as described above [9], i.e., a) CS, b) MA, c) ND or d) DD. The Machine Learning (ML) strategy suggested in MAESTRO for the data fusion among EEG, fNIRS and Pupil Eye data is illustrated in Figure 3.5.

Figure 3.5. Signal processing and Machine Learning strategy for the classification of concatenated EEG, fNIRS and Pupil Eye data.

3.1.2.2 Binary Classification Results: Presence of CWL or not?

In this section, the binary classification results obtained by the transfer learning approach suggested in Figure 3.6 are presented. The binary classification approach aims at determining the presence of CWL that may result from a surgical task. We define two main classes for binary classification, i.e., a) Clinician task in which a CWL is the consequence of a number of surgical tasks, and b) No Clinician Task or no CWL. A data set that involves 22 channels for fNIRS and 18 for EEG respectively, and a sampling window of 200 readings was found to be optimal, and Pupil data that contains the information of the geometry of each surgeon’s pupil. The average of all EEG signals and average of all channels for fNIRS was calculated and then both were concatenated in an input matrix Xi=200samples + Pupil data. Data set was split into two subsets, i.e., a) 70% for training, and b) 30% for validation the suggested model based on transfer learning.

Figure 3.6 Confusion Matrix for (a) training and (b) validation of the transfer learning approach respectively, reaching an average accuracy for testing of approximately 99.97%.

                                  (a)                                                                         (b)

Figure 3.7 Confusion matrix for the (a) training and (b) validation of the proposed 1D-CNN for the classification of tasks 1-4.

3.1.2.3 Multiclass Classification of Surgical Tasks

In this section, the results that corresponds to the recognition of the different types of CWL that may results from each task 1-4 are presented. A data set of 3392 samples was constructed from raw fNIRS + EEG signals and used to train the proposed 1D Convolutional Neural Network (CNN) for the classification of task 1-4. As illustrated in Figure 3.7, an average accuracy of 97.84 % was determined for training data, while a model accuracy for testing (validating) the suggested model was found to be 98.14%.

Figure 3.8 Flow diagram of the cross-validation approach for the training and testing of a 1D CNN for the classification of tasks 1-4 and no task (No CWL)

3.1.2.4 ECG Modelling based on Machine Learning

In this section, the classification accuracy for EEG signals is presented. Based on the flow diagram shown in Figure 3.8, an initial approach that follows a three-step strategy is implemented.

                        (a)                                                                                              (b)

Figure 3.9. Confusion matrix for the (a) training and (b) validation of the proposed 1D-CNN for the classification of task 1-4 and no task.

First, raw ECG signal are filtered using a 5th order Butterwort filter. Similar to the procedure applied for the classification of task 1-4 using a concatenation of fNIRS and EEG signals, a 1D CNN is used to classify tasks 1-4 and no task (No CWL). Using only filtered ECG signals. In Figure 3.9the resulting confusion matrix for training and testing of the proposed 1D CNN is presented. An overall accuracy of 88.52% and 61.03% for training and testing is provided by the 1D CNN respectively. Task 1, Task 2, Task 3, Task 4 and No Task or No CWL.

Finally, in Table 3.1, the average classification accuracy of five random experiments using several Machine Learning (ML) strategies is presented. The list of implemented methodologies include:

  1. Random Forest + Support Vector Machines (RF + SVM)
  2. Random Forest + Principal Component Analysis (RF + PCA)
  3. Convolutional Neural Network of 1 dimension (1D CNN)
  4. Transfer learning using AlexNet CNN
  5. Multilayer Extreme Learning Machine (MELM)
  6. 2 Dimensional CNN (2D CNN)

From table 3.1, two strategies offer the highest model accuracy for both, a) binary classification (100%), and b) multiclass classification (~93%), i.e., 1) 1-D CNN and 2) Multilayer Extreme Learning Machine (MELM).

Table 3.1: Average Classification Accuracy provided by various Machine Learning approaches.

Model

Signals used for the classification

Binary Classification (%)

Multiclass Classification (%)

 

 

Training

Testing

Training

Testing

RF + SVM

EEG - ECG

85

70

80

50

RF + PCA

EEG

88

73

82

54

RF + PCA

EEG + ECG

87

80

70

49

1D CNN

EEG - fNIRS

100

100

99

93

1D CNN

ECG

90

85

90

60

1D CNN

EEG - fNIRS - ECG

93

90

88

55

Transfer Learning

EEG

95

90

85

59

Transfer Learning

ECG

90

76

80

50

MELM

EEG - fNIRS

100

100

96

85

2D CNN + CWT

EEG

94

84

89

63

Transfer Learning

+ 1D-CNN

EEG + fNIRS + Pupil Eye Data

90

82

80

80

3.1.2.5 Functional Near-Infrared Spectroscopy (fNIRS)

The modelling of fNIRS and the recognition of variations of HbO and HbR, two different approaches from the field of Machine Learning have been implemented:

  1. General Linear Modelling (GLM) for regression (prediction of HbR and HbO). In this approach a brain stimuli model has been constructed using linear systems theory.
  2. A hybrid methodology based on Extreme Learning Machine and Radial Basis Function Neural Networks for Autoregression for the prediction of HbR and HbO and therefore use this information to estimate variations in the CWL during clinician activities.

Figure 3.10 Workflow for the modelling and variations of HbR and HbO using a GLM approach.

Figure 3.11 Hybrid approach implemented for the modelling of fNIRS and prediction of a) HbO and b) HbR.

As illustrated in Figure 3.10, on the one hand, a methodology based on General Linear Modelling is suggested to determine the brain stimuli model per task, and this information is used to quantify the variations of HbO and HbR in the presence of different clinician activities. On the other hand, a hybrid method that involves two distinct machine learning approaches is implemented. The proposed method is based on neural networks for time series prediction of the variations of HbO and HbR using a fast-learning approach called Extreme Learning Machine (ELM).

Figure 3.12. fNIRS signals that correspond to 22 channels for subject P002 and P005.

Modelling results for functional Infrared Spectroscopy signals (fNIRS) obtained by using a General Linear Model (GLM) on the one hand, and a hybrid method based on ELM and Neural networks and data clustering is presented on the other hand. First, the recorded fNIRS signals that corresponds to subjects P002 and P005 are illustrated in Figure 3.13.

                                         (a)                                                                               (b)

Figure 3.13. Prediction of Haemoglobin using a GLM for subjects (a) P002 and (b) P005.

Figure 3.14. Prediction of variations of HbO for task 4 using a GLM, where a [0-1] scale has been computed, in which means the max value for HbO.

 

Figure 3.15. Prediction of variations of HbO for task 4 using a GLM, where a [0-1] scale has been computed, in which means the max value for HbO.

3.1.3 Conclusions

As part of the platform development and its implementation, this study contributed to identify the main areas of the MAESTRO backbone that could be improved. These included sensor synchronisation, development of a GUI (front end) to identify signal dropout and to improve data collection methods such as use and calibration of sensors. In terms of decision support during surgical tasks, different Machine Learning (ML) strategies based on traditional classification techniques, deep learning machine, and transfer learning were also implemented. Based on these results, it was determined that a two-step ML for the identification of any presence of Cognitive Workload (CWL) and the corresponding surgical task provided the highest trade-off between model simplicity, computational cost and model accuracy. The proposed ML algorithms not only provide an inference mechanism for decision making for identifying the type of cognitive task, but also an online ML method for continual learning of multimodal data as well as for data fusion.     

In terms of Functional Infra-red Spectroscopy (fNIRS),

  • It was proposed a fast and robust machine learning algorithm for the modelling of haemoglobin concentration to perform a surgical task
  • Based on our modelling results, a significant increase of HbO is observed particularly in P002 and P005.
  • A naturalistic model for HRF has been suggested.
  • Both, GLM and RBFNN use linear systems theory for its parameter optimisation

                                               (a)                                                                 (b)

Figure 3.16 (a) Laparoscope and (b) Peg Transfer Task.

4. References

[1] AGHAJANI, Haleh; GARBEY, Marc; OMURTAG, Ahmet. Measuring mental workload with EEG+ fNIRS. Frontiers in human neuroscience, 2017, vol. 11, p. 359.

[2] Antonio Maria Chiarelli, Pierpaolo Croce, Arcangelo Merla, and Filippo Zappasodi. Deep learning for hybrid eeg-fnirs brain–computer interface: application to motor imagery classification. Journal of neural engineering, 15(3):036028, 2018.

[3] Roger D Dias, Minhtran C Ngo-Howard, Marko T Boskovski, Marco A Zenati, and Steven J Yule. Systematic review of measurement tools to assess surgeons’ intra- operative cognitive workload. Journal of British Surgery, 105(5):491–501, 2018.

[4] Maria Camila Guerrero, Juan Sebastia ́n Parada, and Helbert Eduardo Espitia. Eeg signal analysis using classification techniques: Logistic regression, artificial neural networks, support vector machines, and convolutional neural networks. Heliyon, 7(6), 2021.

[5]  Taiyong Li and Min Zhou. Ecg classification using wavelet packet entropy and random forests. Entropy, 18(8), 285, 2016.

[6] Siyuan Lu, Zhihai Lu, and Yu-Dong Zhang. Pathological brain detection based on alexnet and transfer learning. Jour- nal of computational science, 30:41–47, 2019.

[7] Pablo Ortega and A. Aldo Faisal. Deep learning multimodal fnirs and eeg signals for bimanual grip force decoding. Jour- nal of Neural Engineering, 18(4):0460e6, 2021.

[8] Adrian Rubio-Solis, George Panoutsos, Carlos Beltran- Perez, and Uriel Martinez-Hernandez. A multilayer interval type-2 fuzzy extreme learning machine for the recognition of walking activities and gait events using wearable sensors. Neurocomputing, 389:42–55, 2020.

[9] Marjan Saadati, Jill Nelson, and Hasan Ayaz. Multimodal fnirs-eeg classification using deep learning algorithms for brain-computer interfaces purposes. In International Conference on Applied Human Factors and Ergonomics, pages 209–220. Springer, 2019.