EURURO Vol. 72 No. 5

3.4.1.2.1. Comparison of five studies that utilised both 1973 and 2004/

2016 classifications.

The pooled recurrence rates were (G1 vs

G2 vs G3) 33% versus 42% versus 63% and (PUNLMP vs LG vs

HG) 20% versus 38% versus 55%, respectively

( Table 3

The majority of patients in these five studies had Ta

disease; a separate analysis in T1 patients was not possible

[16,20,26,30,33]

. A subgroup analysis of T1 HG patients

revealed a higher recurrence rate in G3 patients compared

with G2 patients (68% vs 50%)

[22] .

3.4.1.2.2. Comparisons using all available data.

The percentage of

patients with recurrence using the 1973 grade (G1 vs G2 vs

G3) was 33% versus 44% versus 65%, respectively. For the

2004/2016 grade (PUNLMP vs LGvsHG), recurrence occurred

in 28% versus 43% versus 58%, respectively

( Table 3

Separate analysis of Ta patients revealed higher recur-

rence rates in G3 disease (G1 vs G2 vs G3): 39% versus 41%

versus 71%, respectively; in Ta patients, PUNLMP patients

have lower recurrence rates than LG or HG patients: 28%

versus 52% versus 60%, respectively. No comparisons were

possible in T1 patients

( Table 3

3.4.1.3. Death due to bladder cancer.

Only one study provided

limited information regarding death due to bladder cancer,

so no conclusions could be drawn

[29]

3.4.1.4. Death due to any cause.

Information on all-cause

mortality was available on a limited basis in two studies

[18,28]

, and only one study contributed to the analysis

[31]

. In this study, death rates for patients with the best and

worst prognosis seem to be similar in the two grading

classifications, but no conclusions can be drawn.

3.4.2.

Reproducibility and repeatability outcomes

3.4.2.1. Reproducibility.

The interobserver agreement and kap-

pa values for the 1973 and 2004/2016 WHO classifications

are presented in

Table 4

The interobserver agreement for the 1973 classification

ranged from 38% to 89% (kappa values from 0.003 to 0.68).

Agreement in combined assessment of G1 + G2 versus G3

tumours in two studies

[3,16]

was higher than in separate

assessment of G1 versus G2 versus G3 tumours (80–89% vs

39–66%; kappa values 0.44–0.68 vs 0.15–0.68). The interob-

server agreement for the 2004/2016 classification ranged

from 43% to 100% (kappa values 0.17–0.70). Only one study

assessed the agreement between two pathologists in a

combined review of PUNLMP + LG versus HG tumours

[3] .

showed slightly better reproducibility than for a separate

analysis of PUNLMP versus LGversusHG tumours (73–86%vs

43–66%, kappa values 0.46–0.72 vs 0.17–0.48). In this study,

two additional pathologists assessed slides according to

2004/2016 WHO classification only. The interobserver

agreement for the separate review of PUNLMP versus LG

versus HG tumours between these two pathologists andwith

the latter two pathologists ranged from 38% to 74% (kappa

values from 0.13 to 0.58) and for combined review of

PUNLMP + LG versus HG tumours ranged from 65% to 88%

(kappa values from 0.30 to 0.73).

3.4.2.2. Repeatability.

The intraobserver repeatability and

kappa values for the 1973 and 2004/2016 WHO classifica-

tions are presented in

Table 5

. Only two studies assessed the

repeatability of both grading systems

[3,15]

. The intraob-

server agreement for the 1973 WHO grading classification

ranged from 63% to 95% (kappa values 0.61–0.88).

Repeatability for combined assessment of G1 + G2 versus

G3 tumours was slightly higher than that for a separate

analysis of G1 versus G2 versus G3 tumours (88–95% vs

63–81%, kappa values 0.64–0.88 vs 0.61–0.69). The

intraobserver agreement for the 2004/2016 WHO grading

classification ranged from 71% to 93% (kappa values 0.56–

0.83). In the only study that assessed the difference between

combined and separate pathological reviews, repeatability

of group PUNLMP + LG versus HG was higher than that of

PUNLMP versus LG versus HG (86–90% vs 71–82%, kappa

values 0.68–0.80 vs 0.56–0.69)

[3]

. In this study, two

additional pathologists assessed slides twice using the

2004/2016WHO classification with 72% and 88% agreement

for separate review of PUNLMP versus LG versus HG (kappa

values 0.55 and 0.81, respectively), and 85% and 97%

Table 4 – Interobserver reproducibility for the 1973 and 2004/2016 WHO classifications

1973 WHO classification

2004/2016 WHO classification

Study

Type of

analysis

Agreement

(95% CI)

Kappa

(95% CI)

Type of

analysis

Agreement

(95% CI)

Kappa

(95% CI)

Mangrud (2014)

[16]

G1 vs G2 vs G3

66% (59–73%)

0.68 (0.57–0.78)

100%

G1 + G2 vs G3

89% (83–93%)

0.68 0.56–0.80)

66%

89%

LG vs HG

87% (81–91%)

0.70 (0.59–0.81)

56%

65%

van Rhijn (2010)

[3]

G1 vs G2 vs G3

39–54%

0.15–0.32

PUNLMP vs LG vs HG

43–66%

0.17–0.48

G1 + G2 vs G3

80–85%

0.44–0.58

PUNLMP + LG vs HG

73–86%

0.46–0.72

May (2010)

[33] b

G1 vs G2 vs G3

38–73%

0.003–0.365

PUNLMP vs LG vs HG

71–82%

0.296–0.516

CI = confidence interval; G1 = grade 1; G2 = grade 2; G3 = grade 3; HG = high grade; LG = low grade; PUNLMP = papillary urothelial neoplasm with low

malignant potential; WHO = World Health Organization.

Pathologist A versus pathologist D (analysis of a total of four different combinations of two rounds of the grading assessment).

Only Ta tumours included.

Pathologist A versus B versus C versus D (a total of six pairwise comparisons).

E U R O P E A N U R O L O G Y 7 2 ( 2 0 1 7 ) 8 0 1 – 8 1 3

809