

2.
Evidence acquisition
2.1.
Search strategy
Protocols for both the prognostic and reproducibility
reviews have been published
( http://www.crd.york.ac.uk/ PROSPERO;registration numbers CRD42015025045 and
CRD42016029714); the search strategy is outlined in the
Supplementary material.
Databases including Medline, Embase, and the Cochrane
Central Register of Controlled Trials were systematically
searched from 1 January 1998 to 31 December 2015. All
abstracts and full-text articles were independently
screened by at least two reviewers. Disagreement was
resolved by discussion with an independent arbiter. The
search was complemented by additional sources including
the reference lists of included studies and a panel of experts
(EAU NMIBC Panel).
2.2.
Types of study designs
Prospective and retrospective studies comparing the two
grading systems were included. Only studies published from
1998 onwards were included. There were no language
restrictions. A minimum follow-up of 3 mo (recurrence and/
or progression) was required for inclusion in the prognostic
review. Reproducibility assessment by two or more pathol-
ogists required use of identical specimens and grading
systems. For the assessment of the repeatability of a grading
system by the same pathologist, each pathologist or group of
pathologists had to assess identical specimens using the
same grading system at more than one time point.
2.3.
Types of participants
Study inclusion criteria were as follows: adult patients
(
>
18 yr old) with primary or recurrent Ta/T1 urothelial
carcinoma of the bladder who underwent a transurethral
resection of bladder tumour (TURBT). All risk groups and
adjuvant treatments were included. Exclusion criteria were
as follows: patients under 18 yr; muscle-invasive bladder
cancer (MIBC); clinical N+ or M+; grading based on radical
cystectomy specimen; and bladder biopsies only (as
opposed to TURBT). The protocol allowed inclusion of
studies with exclusion criteria if affected patients consti-
tuted
<
10% of the study population.
2.4.
Type of outcome measures
In the prognostic review, the primary outcome was
progression to muscle-invasive or metastatic stage. Sec-
ondary outcomes were bladder recurrence, and overall and
cancer-specific survival. All outcomes were measured at
least 3 mo post-TURBT.
In the reproducibility review, the primary outcome was
interobserver variability (reproducibility) between pathol-
ogists. The secondary outcome was intraobserver variability
(repeatability) by the same pathologist and reliability
(variability due to heterogeneity of patient populations).
2.5.
Assessment of risk of bias
As recommended by the Cochrane Prognosis Methods
Group, the risk of bias (RoB) in the included studies was
assessed using the QUIPS tool across six domains: study
participation, attrition, prognostic factor measurement,
outcome measurement, confounders, and statistical analy-
sis
[11]. The EAU NMIBC Guidelines Panel identified
intravesical BCG (yes/no), stage (Ta/T1), and concomitant
carcinoma in situ (CIS) (yes/no) as three most important
prognostic confounders. The Cochrane Collaboration
recommends not to combine domains or give overall
summary scores
[12]. We used Revman 5.3 software to
generate graphs showing RoB for each domain, within and
across studies.
2.6.
Data extraction and analysis
In the prognostic review, outcome events along with all
unadjusted (univariate) and adjusted (multivariable) mea-
sures of association, such as odds ratios and hazard ratios,
were extracted, including those in subgroups of interest.
In the reproducibility review, all outcomes of reproduc-
ibility, repeatability, and reliability, both overall and in
subgroups of interest, were extracted. Assessment of
concordance was evaluated using Cohen’s kappa statistic
(coefficient
k
). Arbitrary guidelines characterise values of
kappa
>
0.75 as excellent concordance, 0.40–0.75 as fair to
good, and below 0.40 as poor
[13].
3.
Evidence synthesis
3.1.
Quantity of evidence identified
The study selection process is outlined in the Preferred
Reporting Items for Systematic Reviews and Meta-analysis
(PRISMA) flow diagram
( Fig. 2). A total of 3593 abstracts
were reviewed for both prognostic performance and
reproducibility, of which 34 full texts were retrieved for
further screening. Ultimately, 22 eligible studies were
identified; however, two studies
[14,15]were excluded as
subsequent publications provided updated data
[16,17].
Finally, 20 studies recruiting a total of 4505 patients met the
inclusion criteria for prognostic performance
[3,16–34].
Three of these studies involving 566 patients met the
reproducibility inclusion criteria
[3,16,33] .3.2.
Characteristics of the 20 included studies
The baseline characteristics of studies included in the
prognostic review are detailed in
Table 1. The three
retrospective studies contained information on reproduc-
ibility or repeatability: Mangrud et al
[16]—three patholo-
gists independently reviewed both classifications and two
pathologists repeated the classification for intraobserver
variability; however, only one pathologist assessed both
grading systems. Van Rhijn et al
[3] —two pathologists (A + D)
reviewed both classifications on four separate occasions
(both systems twice), allowing a direct comparisonof the two
E U R O P E A N U R O L O G Y 7 2 ( 2 0 1 7 ) 8 0 1 – 8 1 3
803