EuroSCORE Perfomance in Minimally Invasive Cardiac Surgery

R Margaryan 1,2,M Moscarelli 1, T Gasbarri 1, G Bianchi 1, E Kallushi 1, A G Cerillo 1, P A Farneti 1, M Solinas 1
1Ospedale Del Cuore Fondazione 'G. Monasterio', Massa, Italy.
2Università Di Siena, Siena, Italy


The European System for Cardiac Operative Risk Evaluation (EuroSCORE) has two versions and the last update was committed in 2012[1]. It was new tool for estimation in hospital mortality after cardiac surgery as substitution for older additive and logistic EuroSCORE developed originally in 1991[2]. These older versions and new version gained wide popularity and are used worldwide for cardiac surgical services[3,4]. They were tested and validated for by different groups worldwide[4]. Moreover, additive/logistic EuroSCOREs and new EuroSCORE II have been employed in recent years together with other evaluation scores (e.g. Society of Thoracic Surgeons(STS) score), for the screening, selection of high risk patients for hybrid surgical techniques, for instance, TAVI[5], but analysis of its performance within these specific surgical subpopulation has first underlined tendency to over-predict the risk of mortality and morbidity.
However, there are no report which have performed external calibration of european risk score systems for minimally invasive cardiac surgery (MICS).
This study was designed with two main endpoints: to validate externally the original EuroSCORE (additive and logistic) and EuroSCORE II and to compare its performance with previous versions. Moreover we sought to evaluate the discrimination and the calibration of the EuroSCORE II.


Study design and Participants
The study population included all patients who underwent minimally invasive cardiac surgery (MICS) in 6 year period (from 2007 to 2013, overall 2511 patients enrolled) with the department of single center. They all underwent MICS procedure (see Figure 2 for Details). Trans-catheter/percutaneous valve implantations procedures were excluded from the study group, only TAVI via thoracotyomy was included.
Preoperative and demographic information, operative data and peri-operative mortality, and complication for tall patients were retrieved from institutional databases that were prospectively collected. The Institutional Ethical Committees approved the study and the requirement for informed written consent was waived on the condition that subjects’ identities were masked.

For the evaluation and validation of the performance of the three scores (additive and logistic EuroSCOREs, EuroSCORE II) were calculated for each patient according to published guidelines with a dedicated software[1,2,6].

Statistical methods and analysis

The performance of the EuroSCORE models was analysed focusing on discrimination power and calibration[7,8]. The discrimination performance indicates the extent to which the model distinguishes between patients who will die or survive in the perioperative period. It was evaluated by constructing receiver operating characteristic curves for each model and calculating the area under the curve (AUC) with with 95% confidence intervals[9,10]. The comparison among the curves was analysed with Delong, method[10]. Another index used to evaluate the predictive abilities was Somers’ Dxy rank correlation between predicted probabilities and observed responses. When Dxy = 0, the model is making random prediction , when Dxy = 1, the predictions are perfectly discriminating.
Calibration refers to the agreement between observed outcomes and predictions. The calibration performance can be evaluated by generating calibration plots that visually compare the prediction with the observed probability[7,10,11]. The calibration was tested with the Hosmer-Lemeshow goodness-of-fit test, which compares observed to predicted values by decile of predicted probability. The accuracy of the model was also tested calculating the Brier score[10,11]. Missing values were substituted by means of multiple imputation in order to reduce bias and increase statistical power[12]. Two sided statistics were performed with a significance level of 0.05. For all analysis R Statistical Computing Environment[13] were used with RStudio (RStudio (2015). RStudio: Integrated development environment for R (Version 0.99.558) [Computer software]. Boston, MA. Retrieved May 20, 2015. Available from http://www.rstudio.org/)


Patients’ Characteristics
There were 42 (1.7%) hospitals deaths. The mean values of additive EuroSCORE, logistic EuroSCORE and EuroSCORE II of population were 6.2 ± 2.8, 7.8 ± 8.5 and 3 ± 4.4.
Performance of EuroSCORE
The AUC was high in all algorithms; logistic EuroSCORE was 0.79 (95% CI: 0.72 - 0.86),additive EuroSCORE was 0.79 (95% CI: 0.73 - 0.86) and EuroSCORE II was 0.83 (95% CI: 0.76 - 0.89; see Figure 1 C).

The comparison among scores’s performance did not show significance differences between additive EuroSCORE and logistic EuroSCORE ( p = 0.16); between additive EuroSCORE and EuroSCORE II (p = 0.16) and logistic EuroSCORE and EuroSCORE II (p = 0.1, see Figure 1 C).
The calibration curves of EuroSCORE and EuroSCORE II shown in Figure 1 A and B. Calibration plot and related statistic s for additive EuroSCORE makes little or no sense as it is a simple and user friendly instrument with additive property and does not predict mortality correctly.
The calibration for EuroSCORE II is quite close to the ideal diagonal until 8%-predicted probability (see red dashed line on Figure 1 A) and diverge significantly and markedly afterwards showing significant over-prediction. Logistic EuroSCORE shows a progressive trend to over-prediction from the low-predicted risk and remains deviated from ideal calibration line (45 degree dashed line, see Figure 1 B). Both tests were significantly different in unreliability and Hosmer-Lemeshov test (χ2= 144.8, p < 0.01 and χ2= 17.94, p = 0.02 for Logistic EuroSCORE and EuroSCORE II, respectively) indicating that they do not provide adequately accurate probabilities.


This performance and validation study demonstrates that EuroSCORE performance and calibration is limited in MICS populations. EuroSCORE II showed and optimal calibration until 8% predicted mortality, which represents a large proportion of of patients in MICS. Nevertheless, the EuroSCORE II is not optimally calibrated for MICS and should be avoided its use in high risk patients (>8%). Taken together, a new risk score should be develops and implemented for MICS cohorts or future versions of EuroSCORE should address this issue as well.

