Comments on: BOARDS OF CANADA Societas x Tape Part 1- 23rd June 2019 – 30th Year WARP RECORDS Anniversary

Interpretation of the AUC

Admin — Thu, 13 Sep 2018 21:45:05 +0000

The AUC* or concordance statistic c is the most commonly used measure for diagnostic accuracy of quantitative tests. It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest. Since the measure is based on ranks, it is not sensitive to systematic errors in the calibration of the quantitative tests.

It is very well known that a test with no better accuracy than chance has an AUC of 0.5, and a test with perfect accuracy has an AUC of 1. But what is the exact interpretation of an AUC of for example 0.88? Did you know that the AUC is completely equivalent with the Mann-Whitney U test statistic?

*AUC: the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve.

Example

Around 27% of the patients with liver cirrhosis will develop
Hepatocellular Carcinoma (HCC) within 5 years of follow-up. With our
biomarker “peakA” we would like to predict which patients will develop
HCC, and which won’t. We will assess the diagnostic accuracy of
biomarker “peakA” using the AUC.

To keep things visually clear, we suppose we have a dataset of only 12
patients. Four patients did develop HCC (the “cases”) and 8 didn’t (the
“controls”). (fictive data)

HCC	Biomarker_value
0	1.063
1	1.132
1	1.122
1	1.058
0	0.988
0	1.182
0	1.037
0	1.052
0	0.925
1	1.232
0	0.911
0	0.967

The AUC can be defined as “The probability that a randomly selected case
will have a higher test result than a randomly selected control”. Let’s
use this definition to calculate and visualize the estimated AUC.
In the figure below, the cases are presented on the left and the
controls on the right. Since we have only 12 patients, we can easily
visualize all 32 possible combinations of one case and one control.
(Rcode below)

Those 32 different pairs of cases and controls are represented by lines
on the plot above. 28 of them are indicated in green. For those pairs,
the value for “PeakA” is higher for the case compared to the control.
The remaining 4 pairs are indicated in blue. The AUC can be estimated as
the proportion of pairs for which the case has a higher value compared
to the control. Thus, the estimated AUC is the proportion of green lines
or 28/32 = 0.875. This visualization might help to understand the
concept of an AUC. Besides this educational purpose, this type of plot
is not very useful. Hopefully, the sample size of your study is much
larger than 12 patients. And in that situation, this type of plot will
become very crowded.

The ROC curve

Now let’s verify that the AUC is indeed equal to 0.875 in the classical
way, by plotting a ROC curve and calculating the estimated AUC using the
ROCR package.

The ROC curve plots the False Positive Rate (FPR) on the X-axis and the
True Postive Rate (TPR) on the Y-axis for all possible thresholds (or
cutoff values).

–True Positive Rate (TPR) or sensitivity: the proportion of actual
positives that are correctly identified as such.
–True Negative Rate (TNR) or specificiy: the proportion of actual
negatives that are correctly identified as such.
–False Positive Rate (FPR) or 1-specificity: the proportion of
actual negatives that are wrongly identified as positives.

library(ROCR)
pred <- prediction(df$Biomarker_value, df$HCC )
perf <- performance(pred,"tpr","fpr")
plot(perf,col="black")
abline(a=0, b=1, col="#8AB63F")

The green line represents a completely uninformative test, which
corresponds to an AUC of 0.5. A curve pulled close to the upper left
corner indicates (an AUC close to 1 and thus) a better performing test.
The ROC curve does not show the cutoff values

The ROCR package also allows to calculate the estimated AUC:

auc<- performance( pred,  c("auc"))
unlist(slot(auc , "y.values"))

[1] 0.875

The estimated AUC based on this ROC curve is indeed equal to 0.875, the
proportion of pairs for which the value of “PeakA” is larger for HCC
compared to NoHCC.

Relation to cutoff points of the biomarker

Visualizing the sensitivity and specificity as a function of the cutoff
points of the biomarker results in a plot that is at least as
informative as a ROC curve and (in my opinion) easier to interpret. The
plot can be created using the ROCR package.

library(ROCR)
testy <- performance(pred,"tpr","fpr")

Using the str() function, we see that the following slots are part of
the testy object:

alpha.values: Cutoff
x.values: Specificity or True Negative Rate
y.values: Sensitivity or True Positive Rate

plot(testy@alpha.values[[1]], testy@x.values[[1]], type='n',  
     xlab='Cutoff points of the biomarker', 
     ylab='sensitivity or specificity')
lines(testy@alpha.values[[1]], testy@y.values[[1]], 
      type='s', col="#1A425C", lwd=2)
lines(testy@alpha.values[[1]], 1-testy@x.values[[1]], 
      type='s', col="#8AB63F", lwd=2)
legend(1.11,.85, c('sensitivity', 'specificity'), 
       lty=c(1,1), col=c("#1A425C", "#8AB63F"), cex=.9, bty='n')

The plot shows how the sensitivity increases as the specificity
decreases and vice versa, in relation to the possible cutoff points of
the biomarker.

Mann-Whitney U test statistic

The Mann-Whitney U test statistic (or Wilcoxon or Kruskall-Wallis test
statistic) is equivalent to the AUC (Mason, 2002). The AUC can be
calculated from the output of the wilcox.test() function:

wt <-wilcox.test(data=df, df$Biomarker_value ~ df$HCC)
1 - wt$statistic/(sum(df$HCC==1)*sum(df$HCC==0))

    W 
0.875

The p-value of the Mann-Whitney U test can thus safely be used to test
whether the AUC differs significantly from 0.5 (AUC of an uninformative
test).

wt <-wilcox.test(data=df, df$Biomarker_value ~ df$HCC)
wt$p.value

[1] 0.04848485

Simulation: the completely uninformative test.

Now, let’s have a look how our plots look like if our biomarker is not
informative at all.

Data creation:

#simulation of the data
set.seed(12345)
HCC <- rbinom (n=12, size=1, prob=0.27)
Biomarker_value <- rnorm (12,mean=1,sd=0.1) + HCC*0 
# replacing the zero by a value would make the test informative
df<-data.frame (HCC, Biomarker_value)
library(knitr)
kable (head(df))

HCC	Biomarker_value
0	1.0630099
1	0.9723816
1	0.9715840
1	0.9080678
0	0.9883752
0	1.1817312

The function expand.grid() is used to create all possible combinations
of one case and one control:

newdf<- expand.grid (Biomarker_value [df$HCC==0],Biomarker_value [df$HCC==1])
colnames(newdf)<- c("NoHCC", "HCC")
newdf$Pair <- seq(1,dim(newdf)[1])

For each pair the values of the biomarker are compared between case and
control:

newdf$Comparison  <- 1*(newdf$HCC>newdf$NoHCC)
mean(newdf$Comparison)

[1] 0.40625

newdf$Comparison<-factor(newdf$Comparison, labels=c("HCC>NoHCC","HCC<=NoHCC"))
library (knitr)
kable(head(newdf,4))

NoHCC	HCC	Pair	Comparison
1.0630099	0.9723816	1	HCC>NoHCC
0.9883752	0.9723816	2	HCC>NoHCC
1.1817312	0.9723816	3	HCC>NoHCC
1.0370628	0.9723816	4	HCC>NoHCC

library(data.table)
setDT(newdf)
longdf = melt(newdf, id.vars = c("Pair", "Comparison"),
              variable.name = "Group",
                measure.vars = c("HCC", "NoHCC"))

lab<-paste("AUC = Proportion n of green lines nAUC=", round(table(newdf$Comparison)[2]/sum(table(newdf$Comparison)),3))
library(ggplot2)
fav.col=c("#1A425C", "#8AB63F")
ggplot(longdf, aes(x=Group, y=value))+geom_line(aes(group=Pair, col=Comparison)) + 
        scale_color_manual(values=fav.col)+theme_bw() + 
        ylab("Biomarker value") + geom_text(x=0.75,y=0.95,label=lab) + 
        geom_point(shape=21, size=2) + 
        theme(legend.title=element_blank(), legend.position="bottom")

library(ROCR)
pred <- prediction(df$Biomarker_value, df$HCC )
perf <- performance(pred,"tpr","fpr")
plot(perf,col="black")
abline(a=0, b=1, col="#8AB63F")

Calculating the AUC:

auc<- performance( pred,  c("auc"))
unlist(slot(auc , "y.values"))

[1] 0.40625

Sensitivity and specificity as a function of the cutoff points of the
biomarker:

library(ROCR)
testy <- performance(pred,"tpr","fpr")
plot(testy@alpha.values[[1]], testy@x.values[[1]], type='n', xlab='Cutoff points of the biomarker', ylab='sensitivity or specificity')
lines(testy@alpha.values[[1]], testy@y.values[[1]], type='s', col="#1A425C")
lines(testy@alpha.values[[1]], 1-testy@x.values[[1]], type='s', col="#8AB63F")
legend(1.07,.85, c('sensitivity', 'specificity'), lty=c(1,1), col=c("#1A425C", "#8AB63F"), cex=.9, bty='n')

Equivalence with the Mann-Whitney U test:

wt <-wilcox.test(data=df, df$Biomarker_value ~ df$HCC)
1 - wt$statistic/(sum(df$HCC==1)*sum(df$HCC==0))

      W 
0.40625

wt$p.value

[1] 0.6828283

General remarks on the AUC

Often, a combination of new markers is selected from a large set.
This can result in overoptimistic expectations of the marker’s
performance. Any performance measure should be estimated with
correction for optimism, for example by applying cross-validation or
bootstrap resampling. However, validation in fully independent,
external data is the best way to validate a new marker.
When we want to assess the incremental value of a additional marker
(e.g. molecular, genetic, imaging) to an existing model, the
increase of the AUC can be reported.

References

Mason, S. J. and Graham, N. E. (2002), Areas beneath the relative
operating characteristics (ROC) and relative operating levels (ROL)
curves: Statistical significance and interpretation. Q.J.R. Meteorol.
Soc., 128: 2145-2166.

Steyerberg, Ewout W. et al. “Assessing the Performance of Prediction
Models: A Framework for Some Traditional and Novel Measures.”
Epidemiology (Cambridge, Mass.) 21.1 (2010): 128-138.

The post Interpretation of the AUC first appeared on Colman Statistics.

Prediction interval, the wider sister of confidence interval

Admin — Fri, 15 Jun 2018 20:40:57 +0000

In this post, I will illustrate the use of prediction intervals for the
comparison of measurement methods. In the example a new spectral method
for measuring whole blood hemoglobin is compared with a reference
method.
But first, let’s start with discussing the large difference between a
confidence interval and a prediction interval.

Prediction interval versus Confidence interval

Very often a confidence interval is misinterpreted as a prediction
interval, leading to unrealistic “precise” predictions. As you will see,
prediction intervals (PI) resemble confidence intervals (CI), but the
width of the PI is by definition larger than the width of the CI.

Let’s assume that we measure the whole blood hemoglobin concentration in
a random sample of 100 persons. We obtain the estimated mean (Est_mean),
limits of the confidence interval (CI_Lower and CI_Upper) and limits of
the prediction interval (PI_Lower and PI_Upper):
(The R-code to do this is in the next paragraph)

Est_mean	CI_Lower	CI_Upper	PI_Lower	PI_Upper
140	138	143	113	167

A Confidence interval (CI) is an interval of good estimates of the
unknown true population parameter. About a 95% confidence interval
for the mean, we can state that if we would repeat our sampling process
infinitely, 95% of the constructed confidence intervals would contain
the true population mean. In other words, there is a 95% chance of
selecting a sample such that the 95% confidence interval calculated from
that sample contains the true population mean.

Interpretation of the 95% confidence interval in our example:
-The values contained in the interval [138g/L to 143g/L] are good
estimates of the unknown mean whole blood hemoglobin concentration
in the population. In general, if we would repeat our sampling process
infinitely, 95% of such constructed confidence intervals would contain
the true mean hemoglobin concentration.

A Prediction interval (PI) is an estimate of an interval in which
a future observation will fall, with a certain confidence level,
given the observations that were already observed. About a 95%
prediction interval we can state that if we would repeat our sampling
process infinitely, 95% of the constructed prediction intervals would
contain the new observation.

Interpretation of the 95% prediction interval in the above example:
-Given the observed whole blood hemogblobin concentrations, the whole
blood hemogblobin concentration of a new sample will be between
113g/L and 167g/L with a confidence of 95%. In general, if we would
repeat our sampling process infinitely, 95% of the such constructed
prediction intervals would contain the new hemoglobin concentration
measurement.

Remark: Very often we will read the interpretation “The whole blood
hemogblobin concentration of a new sample will be between 113g/L and
167g/L with a probability of 95%.” (for example on
wikipedia).
This interpretation is correct in the theoretical situation where the
parameters (true mean and standard deviation) are known.

Estimating a prediction interval in R

First, let’s simulate some data. The sample size in the plot above was
(n=100). Now, to see the effect of the sample size on the width of the
confidence interval and the prediction interval, let’s take a “sample”
of 400 hemoglobin measurements using the same parameters:

set.seed(123)
hemoglobin<-rnorm(400, mean = 139, sd = 14.75)
df<-data.frame(hemoglobin)

Although we don’t need a linear regression yet, I’d like to use the lm()
function, which makes it very easy to construct a confidence interval
(CI) and a prediction interval (PI). We can estimate the mean by fitting
a “regression model” with an intercept only (no slope). The default
confidence level is 95%.

-Confidence interval:

CI<-predict(lm(df$hemoglobin~ 1), interval="confidence") 
CI[1,]

##      fit      lwr      upr 
## 139.2474 137.8425 140.6524

The CI object has a length of 400. But since there is no slope in our
“model”, each row is exactly the same.

-Prediction interval:

PI<-predict(lm(df$hemoglobin~ 1), interval="predict")

## Warning in predict.lm(lm(df$hemoglobin ~ 1), interval = "predict"): predictions on current data refer to _future_ responses

PI[1,]

##      fit      lwr      upr 
## 139.2474 111.1134 167.3815

We get a “warning” that “predictions on current data refer to future
responses”. That’s exactly what we want, so no worries there. As you
see, the column names of the objects CI and PI are the same.

Now, let’s visualize the confidence and the prediction interval.
The code below is not very elegant but I like the result (tips are
welcome :-))

library(ggplot2)
limits_CI <- aes(x=1.3  , ymin=CI[1,2], ymax =CI[1,3])
limits_PI <- aes(x=0.7 , ymin=PI[1,2], ymax =PI[1,3])

PI_CI<-ggplot(df, aes(x=1, y=hemoglobin)) + 
        geom_jitter(width=0.1, pch=21, fill="grey", alpha=0.5) + 
        geom_errorbar (limits_PI, width=0.1, col="#1A425C") +
        geom_point (aes(x=0.7, y=PI[1,1]), col="#1A425C", size=2) +
        geom_errorbar (limits_CI, width=0.1, col="#8AB63F") +
        geom_point (aes(x=1.3, y=CI[1,1]), col="#8AB63F", size=2) +

        scale_x_continuous(limits=c(0,2))+
        scale_y_continuous(limits=c(95,190))+
        theme_bw()+ylab("Hemoglobin concentration (g/L)") + 
        xlab(NULL)+
        geom_text(aes(x=0.6, y=160, label="Predictionninterval", 
                      hjust="right", cex=2), col="#1A425C")+
        geom_text(aes(x=1.4, y=140, label="Confidenceninterval", 
                      hjust="left", cex=2), col="#8AB63F")+
        theme(legend.position="none", 
              axis.text.x = element_blank(), 
              axis.ticks.x = element_blank())
PI_CI

The width of the confidence is very small, now that we have this large
sample size (n=400). This is not surprising, as the estimated mean is
the only source of uncertainty. In contrast, the width of the prediction
interval is still substantial. The prediction interval has two sources
of uncertainty: the estimated mean (just like the confidence interval)
and the random variance of new observations.

Example: comparing a new with a reference measurement method.

A prediction interval can be useful in the case where a new method
should replace a standard (or reference) method. If we can predict well
enough what the measurement by the reference method would be, (given the
new method), than the two methods give similar information and the new
method can be used.

For example in (Tian, 2017) a new spectral method (Near-Infra-Red) to
measure hemoglobin is compared with a Golden Standard. In contrast with
the Golden Standard method, the new spectral method does not require
reagents. Moreover, the new method is faster. We will investigate
whether we can predict well enough, based on the measured concentration
of the new method, what the measurement by the Golden Standard would be.
(note: the measured concentrations presented below are fictive)

Hb<- read.table("https://usercontent.one/wp/rforbiostatistics.colmanstatistics.be/wp-content/uploads/2023/03/Hb.txt", 
                 header = TRUE)
library(pander)
pander(head(Hb))

New	Reference
84.97	87.24
99.91	103.4
111.8	116.7
117	116.7
118.1	113.5
118.2	121.7

plot(Hb$New, Hb$Reference, 
     xlab="Hemoglobin concentration (g/L) - new method", 
     ylab="Hemoglobin concentration (g/L) - reference method")

Prediction Interval based on linear regression:

Now, let’s fit a linear regression model predicting the hemogoblin
concentrations measured by the reference method, based on the
concentrations measured with the new method.

fit.lm <- lm(Hb$Reference ~ Hb$New)
plot(Hb$New, Hb$Reference,  
     xlab="Hemoglobin concentration (g/L) - new method", 
     ylab="Hemoglobin concentration (g/L) - reference method")
cat ("Adding the regression line:")

Adding the regression line:

abline (a=fit.lm$coefficients[1], b=fit.lm$coefficients[2] )
cat ("Adding the identity line:")

Adding the identity line:

abline (a=0, b=1, lty=2)

If both measurement methods would exactly correspond, the intercept
would be zero and the slope would be one (=“identity line”, dotted
line).

Now let’s calculate the confidence interval for this linear regression.

CI_ex <- predict(fit.lm, interval="confidence")
colnames(CI_ex)<- c("fit_CI", "lwr_CI", "upr_CI")

And the prediction interval:

PI_ex <- predict(fit.lm, interval="prediction")

## Warning in predict.lm(fit.lm, interval = "prediction"): predictions on current data refer to _future_ responses

colnames(PI_ex)<- c("fit_PI", "lwr_PI", "upr_PI")

We can combine those results in one dataframe and plot both the
confidence interval and the prediction interval

Hb_results<-cbind(Hb, CI_ex, PI_ex)
pander(head(Hb_results))

New	Reference	fit_CI	lwr_CI	upr_CI	fit_PI	lwr_PI	upr_PI
84.97	87.24	90.58	87.39	93.76	90.58	82.14	99.02
99.91	103.4	103.8	101.4	106.2	103.8	95.66	112
111.8	116.7	114.4	112.6	116.2	114.4	106.4	122.4
117	116.7	119	117.4	120.5	119	111	126.9
118.1	113.5	120	118.5	121.5	120	112	127.9
118.2	121.7	120.1	118.6	121.6	120.1	112.1	128

Visualizing the CI (in green) and the PI (in blue):

plot(Hb$New, Hb$Reference, 
     xlab="Hemoglobin concentration (g/L) - new method", 
     ylab="Hemoglobin concentration (g/L) - reference method")
Hb_results_s <- Hb_results[order(Hb_results$New),]
lines (x=Hb_results_s$New, y=Hb_results_s$fit_CI)
lines (x=Hb_results_s$New, y=Hb_results_s$lwr_CI, 
       col="#8AB63F", lwd=1.2)
lines (x=Hb_results_s$New, y=Hb_results_s$upr_CI, 
       col="#8AB63F", lwd=1.2)
lines (x=Hb_results_s$New, y=Hb_results_s$lwr_PI, 
       col="#1A425C", lwd=1.2)
lines (x=Hb_results_s$New, y=Hb_results_s$upr_PI,
       col="#1A425C", lwd=1.2)
abline (a=0, b=1, lty=2)

In (Bland, Altman 2003) it is proposed to calculate the average width of
this prediction interval, and see whether this is acceptable. Another
approach is to compare the calculated PI with an “acceptance interval”.
If the PI is inside the acceptance interval for the measurement range of
interest (see Francq, 2016).

In the above example, both methods do have the same measurement scale
(g/L), but the linear regression with prediction interval is
particularly useful when the two methods of measurement have different
units.

However, the method has some disadvantages:

Predictions intervals are very sensitive to deviations from the
normal distribution.
In “standard” linear regression (or Ordinary Least Squares (OLS)
regression),the presence of measurement error is allowed for the
Y-variable (here, the reference method) but not for the X-variable
(the new method). The absence of errors on the x-axis is one of the
assumptions. Since we can expect some measurement error for the new
method, this assumption is violated here.

Taking into account errors on both axes

In contrast to Ordinary Least Square (OLS) regression, Bivariate Least
Square (BLS) regression takes into account the measurement errors of
both methods (the New method and the Reference method). Interestingly,
prediction intervals calculated with BLS are not affected when the axes
are switched (del Rio, 2001).

In 2017, a new R-package BivRegBLS was released. It offers several
methods to assess the agreement in method comparison studies, including
Bivariate Least Square (BLS) regression.

If the data are unreplicated but the variance of the measurement error
of the methods is known, The BLS() and XY.plot() functions can be used
to fit a bivariate Least Square regression line and corresponding
confidence and prediction intervals.

 library (BivRegBLS)

Hb.BLS = BLS (data = Hb, xcol = c("New"), 
ycol = c("Reference"), var.y=10, var.x=8, conf.level=0.95)

XY.plot (Hb.BLS,
         yname = "Hemoglobin concentration (g/L) - reference method",
         xname = "Hemoglobin concentration (g/L) - new method",
         graph.int = c("CI","PI"))

Now we would like to decide whether the new method can replace the
reference method. We allow the methods to differ up to a given
threshold, which is not clinically relevant. Based on this threshold an
“acceptance interval” is created. Suppose that differences up to 10 g/L
(=threshold) are not clinically relevant, then the acceptance interval
can be defined as Y=X???, with ?? equal to 10. If the PI is inside the
acceptance interval for the measurement range of interest then the two
measurement methods can be considered to be interchangeable (see Francq,
2016).

The accept.int argument of the XY.plot() function allows for a
visualization of this acceptance interval

XY.plot (Hb.BLS,
         yname = "Hemoglobin concentration (g/L) - reference method",
         xname = "Hemoglobin concentration (g/L) - new method",
         graph.int = c("CI","PI"), 
         accept.int=10)

For the measurement region 120g/L to 150 g/L, we can conclude that the
difference between both methods is acceptable. If the measurement
regions below 120g/l and above 150g/L are important, the new method
cannot replace the reference method.

Regression on replicated data

In method comparison studies, it is advised to create replicates (2 or
more measurements of the same sample with the same method). An example
of such a dataset:

Hb_rep <- read.table("https://usercontent.one/wp/rforbiostatistics.colmanstatistics.be/wp-content/uploads/2023/03/Hb_rep.txt", 
                 header = TRUE)
pander(head(Hb_rep ))

New_rep1	New_rep2	Ref_rep1	Ref_rep2
88.25	94.98	90.17	84.01
109.1	109.2	106.9	98.17
114.6	122.5	116.2	118.2
120.2	114.6	114.5	104.3
108.4	110.7	116.5	108.7
125.9	124.4	112.6	112.4

When replicates are available, the variance of the measurement errors
are calculated for both the new and the reference method, and used to
estimate the prediction interval. Again, the BLS() function and the
XY.plot() function are used to estimate and plot the BLS regression
line, the corresponding CI and PI.

Hb_rep.BLS = BLS (data = Hb_rep, 
                  xcol = c("New_rep1", "New_rep2"), 
                  ycol = c("Ref_rep1", "Ref_rep2"), 
                  qx = 1, qy = 1, 
                  conf.level=0.95, pred.level=0.95)

XY.plot (Hb_rep.BLS,
         yname = "Hemoglobin concentration (g/L) - reference method",
         xname = "Hemoglobin concentration (g/L) - new method",
         graph.int = c("CI","PI"),
         accept.int=10)

It is clear that the prediction interval is not inside the acceptance
interval here. The new method cannot replace the reference method. A
possible solution is to average the repeats. The BivRegBLS package can
create prediction intervals for the mean of (2 or more) future values,
too!

In the plot above, averages of the two replicates are calculated and
plotted. I’d like to see the individual measurements:

plot(x=c(Hb_rep$New_rep1, Hb_rep$New_rep2),
     y=c(Hb_rep$Ref_rep1, Hb_rep$Ref_rep2),
     xlab="Hemoglobin concentration (g/L) - new method", 
     ylab="Hemoglobin concentration (g/L) - reference method")
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]), 
       y=as.numeric(Hb_rep.BLS$Pred.BLS[,2]), 
       lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]), 
       y=as.numeric(Hb_rep.BLS$Pred.BLS[,3]), 
       col="#8AB63F", lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]), 
       y=as.numeric(Hb_rep.BLS$Pred.BLS[,4]), 
       col="#8AB63F", lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]), 
       y=as.numeric(Hb_rep.BLS$Pred.BLS[,5]), 
       col="#1A425C", lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]), 
       y=as.numeric(Hb_rep.BLS$Pred.BLS[,6]), 
       col="#1A425C", lwd=2)
abline (a=0, b=1, lty=2)

Remarks

Although not appropriate in the context of method comparisonstudies, Pearson correlation is still frequently used. See Bland &
Altman (2003) for an explanation on why correlations are not
advised.
Methods presented in this blogpost are not applicable to time-series

References

Confidence interval and prediction interval:
Applied Linear Statistical Models, 2005, Michael Kutner, Christopher
Nachtsheim, John Neter, William Li. Section 2.5

-Prediction interval for method comparison:
Bland, J. M. and Altman, D. G. (2003), Applying the right statistics:
analyses of measurement studies. Ultrasound Obstet Gynecol, 22: 85-93.
section: “Appropriate use of regression”.

Francq, B. G., and Govaerts, B. (2016) How to regress and predict in a
Bland-Altman plot? Review and contribution based on tolerance intervals
and correlated-errors-in-variables models. Statist. Med., 35: 2328-2358.
doi: 10.1002/sim.6872.

del Rio, F. J., Riu, J. and Rius, F. X. (2001), Prediction intervals in
linear regression taking into account errors on both axes. J.
Chemometrics, 15: 773-788.

-Example of a method comparison study:
H. Tian, M. Li, Y. Wang, D. Sheng, J. Liu, and L. Zhang, “Optical
wavelength selection for portable hemoglobin determination by
near-infrared spectroscopy method,” Infrared Phys. Techn 86, 98-102
(2017). doi.org/10.1016/j.infrared.2017.09.004.

-The predict() and lm() functions of R:
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S.
Wadsworth & Brooks/Cole.

The post Prediction interval, the wider sister of confidence interval first appeared on Colman Statistics.