Measurement of cognitive workload in dual clinical tasks

3.2 Study 2: Measurement of cognitive workload of junior surgeons when undertaking dual tasks

Results presented here can also be found in our paper: 

3.2.1 Aims

The aim of this study was to measure the cognitive workload of trainee surgeons and to analyse the effect that a simultaneous cognitive task has on surgical performance and the mental workload levels of the surgeon.

3.2.2 Methods

20 surgical trainees were asked to undertake a laparoscopic task (peg transfer) whilst undertaking a cognitive task (N-back test) of varying difficulties. The N Back test is a task used in cognitive neuroscience to increase working memory. Participants performed a control task with no N-back task as well as an N0, N1 and N2 back task in randomised order. Eye tracking measures and heart rate sensors were synchronised and used to objectively measure mental workload. Subjective workload measures were recorded using a questionnaire call the SURG-TLX, a validated tool used in surgery to measure workload levels. Additionally, video capture data records the on-screen procedure [1-3].

As shown in Figure 3.17, each experimental session contains four tasks, including a control task with no N-back task as well as an N0, N1 and N2 back task in randomised order. Additionally, there is a 30-second baseline recording before each task. Each 3-minute task includes 4 repetitions of the N-back task (each repetition lasts 45s).

Figure 3.17 Experiment outline. Statistical analysis

The distribution of the SURG TLX score is shown in the boxplot below and described in Figure 3.17. To evaluate the statistical independence of each experiment, a one-way ANOVA was performed to compare the effect of task conditions on SURG-TLX score. A one-way ANOVA revealed that there was a statistically significant difference in SURG-TLX score between four different task conditions (pvalue = 5.28e-10) [2].

In terms of the subjective measures, Fig 3.18 shows the mean pegs transferred and mean performance errors. The N-back task can significantly increase the surgeon’s mental demands that affect surgical performance (less peg transferred) and increase the possibility of making errors (making more errors).

                             (a)                                                                                         (b)

Figure 3.18 (a) The boxplot of the SURG TLX score. (b) The plot of the technical performance.

According to the statistical analysis, these four different conditions can be divided into two main categories. The control task, which the participant is only asked to perform laparoscopic peg transfer, can be considered as the unstressed condition. In contrast, the combination of LPT and N-back creates a stresses condition. Moreover, the CWL induced by the stressed situation can be quantified as low, medium, and high, corresponding to N0-back, N1-back, and N2-back.

Unstressed condition:

  • The control task (peg transfer only)
  • Peg transfer & N0 back - low
  • Peg transfer & N1 back - medium
  • Peg transfer & N2 back - high

Stressed conditions: Machine Learning on ECG

In this section, a two-stage machine learning approach was applied on the ECG signal to identify the different participants’ CWL levels. The raw ECG signal was first calibrated into millivolts, and then filtered by 5th order high-pass Butterworth filter with the cut-off frequency at 0.5 Hz. The cleaned ECG signal was segmented into 400-sample sliding windows with a step size equal to 50. After segmentation, three ECG channels were concatenated horizontally, forming a (1200, 1) feature vector. Two 1D-Convolutional Neural Networks were employed in this two-stage machine learning strategy. In stage one, a binary classifier was trained based on the model of 1D-CNN for the detection of stress conditions. The stage two algorithm will be trigged if any stress condition was detected. In stage two, the multiclass classification also based on the 1D-CNN model was implemented for identify the CWL level induced by the stressed situation as low, medium, and high, corresponding to N0-back, N1-back, and N2-back. In both stages, for cross-validation purposes, the original dataset was split into two subsets, i.e.  a) training set (12,710 instances) and b) validation set (also known as testing data using 3178 instances) with a ratio of 8:2.

3.2.3 Experiments and Results
To test the performance of the proposed neural architecture, a binary classifier was trained to discriminate the surgeon activities when a a) single task or b) dual task (not stressed or stressed) is performed. As shown in Fig. 3.19, the resulting confusion matrix for training and validation of the proposed stage-one 1D-CNN is presented. An overall accuracy of 99.62 % and 89.24% for training and validation is provided by the proposed 1D CNN in the initial stage. The corresponding evolution of the proposed 1D-CNN is presented in Figure 3.20.

Figure 3.19 Confusion Matrix to illustrate the average accuracy of the 1D-CNN for binary classification applied to identify a) single surgical task and b) dual task (not stressed or stressed).

                                             (a)                                                                            (b)

Figure 3.20. Evolution of the a) model loss and b) average model accuracy for binary classification

3.2.4 Multiclass identification of surgical tasks using deep learning

In the second stage of this experiment, a multiclass classifier detects is implemented to detect three states of the cognitive workload (CWL) denoted as 1) low, 2) medium, and 3) high, corresponding to N0-back, N1-back, and N2-back respectively. In this sense, the corresponding confusion matrix for the training and validation of the proposed two-stage 1D-CNN is show in Figure 3.21. An overall accuracy of 99.99% and 97.64% for training and validation (testing) is produced by the 1D-CNN, for stage 2.

Figure 3.21 Confusion Matrix to illustrate the average accuracy of the 1D-CNN for binary classification applied to identify a) single surgical task and b) dual task (not stressed or stressed).


Figure 3.22 Confusion Matrix to illustrate the average accuracy of the 1D-CNN for binary classification applied to identify a) single surgical task and b) dual task (not stressed or stressed).


Finally, to demonstrate the evolution of the cross-validation process of a 1D-CNN for multiclass classification, in Figure 3.22, the evolution of the loss and its corresponding model accuracy for 80 training epochs is illustrated.


3.2.5 Conclusion


The modelling of these single modality ECG already shows promising preliminary results. The proposed 1D-CNN models can detect stress conditions and then identify three levels of CWL with an accuracy of 89.24% and 97.64%, respectively. Generally, it has extraordinary performance while requiring minimum hardware.


The study described in the section 3.1, the CWL classification accuracy heavily relies on the EEG, making it the dominating contributor to the whole-model accuracy. However, its performance comes with costs such as high device cost, long preparation and setup time, and causing uncomfortableness along with long wearing. In contrast, the proposed 1D-CNN models inherit the ability of deep learning strategies to perform a feature extraction process as part of its inner decision mechanism. Moreover, by using ECG signals and applying the 1D-CNN, the use of EEG and fNIRS helmet is not necessary anymore. In this sense, additional data processing and feature extraction usually performed by traditional machine learning approaches is removed. In Table3.2, it can be noted that the 1D-CNN provides an overall accuracy of 97% for the prediction of unseen data, facilitating not only an accurate model, but also to avoid using complex electronic devices [3].


Table 3.2 Average Model Accuracy results obtained by the proposed 1D-CNN and a traditional three-layer feedforward Neural Network (NN).


Signals used for the classification

Binary Classification (%)

Multiclass Classification (%)








ECG (1 channel)






ECG (1 channel)






ECG (3 channels)





An example of the surgical task performed at St. Mary’s Hospital for single and dual task is illustrated in Figure 3.23, where data from ECG and Pupil Eye sensors is collected.

Figure 3.23. Single and dual task performed at St. Mary’s Hospital, in which the ECG and Pupil Eye data is collected.

4. References

[1] Alexandra-Maria Ta ̆u ̧tan, Alessandro C Rossi, Ruben de Francisco, and Bogdan Ionescu. Dimensionality reduction for eeg-based sleep stage detection: Comparison of au- to encoders, principal component analysis and factor analysis. Biomedical Engineering/Biomedizinische Technik, 66(2):125–136, 2021.

[2]  Ziyafet Ugurlu, Azize Karahan, Hayriye U ̈nlu ̈, Aysel Ab- basoglu, NalanOzhan Elbas ̧, Sevcan AvcıIs ̧ık, and Aylin Tepe. The effects of workload and working conditions on operating room nurses and technicians. Workplace health & safety, 63(9):399–407, 2015.

[3] Zohreh Zakeri, Neil Mansfield, Caroline Sunderland, and Ahmet Omurtag. Physiological correlates of cognitive load in laparoscopic surgery. Scientific reports, 10(1):1–13, 2020.