Skip to main content

Table 2 Diagnostic performance of different machine learning models

From: Risk prediction and effect evaluation of complicated appendicitis based on XGBoost modeling

Outcome

Dataset

Xgboost

Random forest

CART

Support vector machine

AUC

Training

0.996(0.991,1.000)

1.000(1.000–1.000)

0.820(0.786–0.855)

0.951(0.932–0.970)

 

Test

0.914(0.874,0.955)

0.870(0.819–0.921)

0.742(0.671–0.813)

0.882(0.833–0.931)

Sensitivity(%)

Training

95.9(94.3,97.5)

100.0(100.0,100.0)

72.8(69.2,76.4)

94.4(92.5,96.3)

 

Test

86.5(81.7,91.3)

88.8(84.4,93.2)

71.9(65.6,78.2)

86.5(81.7,91.3)

Specificity(%)

Training

97.4(96.1,98.7)

99.4(98.8,100.0)

86.9(84.2,89.6)

88.5(85.9,91.1)

 

Test

84.6(79.5,89.7)

69.2(62.7,75.7)

75.0(68.9,81.1)

77.9(72.0,83.8)

Precision(%)

Training

97.0(95.6,98.4)

99.3(98.6,100.0)

82.6(79.5,85.7)

87.5(84.8,90.2)

 

Test

82.8(77.5,88.1)

71.2(64.8,77.6)

71.1(64.7,77.5)

77.0(71.1,82.9)

Accuracy(%)

Training

96.7(95.3,98.2)

99.7(98.8,100.0)

80.3(76.9, 83.5)

91.2(88.6, 93.4)

 

Test

85.5(80.5,90.5)

78.2(71.7,83.8)

73.6(66.8, 79.7)

81.9(75.7, 87.0)

PPV(%)

Training

97.1(95.7,98.5)

99.3(98.6,100.0)

82.6(79.5,85.7)

87.5(84.8,90.2)

 

Test

84.8(79.7,89.9)

71.2(64.8,77.6)

71.1(64.7,77.5)

77.0(71.1,82.9)

NPV(%)

Training

96.6(95.1,98.1)

1.00(100.0,100.0)

78.8(75.5,82.1)

94.9(93.1,96.7)

 

Test

89.7(85.4,94.0)

87.8(83.2,92.4)

75.7(69.6,81.8)

87.1(82.4,91.8)