BookmarkSubscribeRSS Feed

An example of Passing-Bablok Regression in SAS

Started ‎01-20-2022 by
Modified ‎01-24-2022 by
Views 7,223

Under the help of @BeverlyBrown  , I finally could edit my post . Also thanks the help of @FreelanceReinh  

 

 

It is SAS code for Passing-Bablok Regression .

P.S. the data I used is stealing from @Rick_SAS  blog 

https://blogs.sas.com/content/iml/2019/01/07/deming-regression-sas.html

 

You could find Passing-Bablok Regression is similar with Deming Regression .



data have;
label x="micrograms per deciliter"
      y="kiloOhms";
input x y @@;
datalines;
169.0 45.5 130.8 33.4 109.0 23.8 94.1 19.8 86.3 20.4 78.4 18.7 
 76.1 16.1  72.2 16.7  70.0 11.9 69.8 14.6 69.5 10.6 68.7 12.7 67.3 16.9 
174.7 57.8 137.9 39.0 114.6 30.4 99.8 21.1 90.1 21.7 85.1 25.2
 80.7 20.6 78.1 19.3  77.8 20.9  76.0 18.2 77.8 18.3 74.2 15.7 73.1 13.9 
182.5 55.5 144.0 38.7 123.8 35.1 107.6 30.6 96.9 25.7 92.8 19.2 
 87.2 22.4  86.3 18.4  84.4 20.7  83.7 20.6 83.3 20.0 83.9 18.8 82.7 21.8 
160.8 49.9 122.7 32.2 102.6 19.2 86.6 14.7 76.1 16.6 69.6 18.8 
 66.7  7.4  64.4  8.2  63.0 15.5 61.7 13.7 61.2 9.2 62.4 12.0 58.4 15.2 
171.3 48.7 136.3 36.1 111.9 28.6 96.5 21.8 90.3 25.6 82.9 16.8 
 78.1 14.1  76.5 14.2  73.5 11.9 74.4 17.7 73.9 17.6 71.9 10.2 72.0 15.6 
;










/*创建一个ID变量*/
data temp;
 set have;
 _id+1;
 call symputx('n1',_id); /*样本个数*/
run;

/*计算 Cn^2 组数据的Sij*/
proc sql;
create table temp2 as
select a.x as x1,a.y as y1,b.x as x2,b.y as y2,
case 
     when x1 eq x2 and y1 gt y2 then  999999
     when x1 eq x2 and y1 le y2 then -999999
     when x1 ne x2 and y1 eq y2 then 0
	 else (y1-y2)/(x1-x2) 
end as Sij
 from temp as a,temp as b
  where a._id<b._id ;
quit;

/*Sij 排除两种情况*/
data temp3;
 set temp2;
 if (x1=x2 and y1=y2) or Sij=-1 then delete;
run;

/*Sij 排序*/
proc sort data=temp3 out=temp4;
by Sij;
run;


proc sql noprint;
/*求有效个数N*/
select count(*) into :n
 from temp4 ;

/*求偏移量K值*/
select count(*) into :k
 from temp4
  where Sij<-1;
quit;
%put &=n &=k ;

/*求斜率的点估计值b  (&slope.) */
data slope;
 set temp4;
 if mod(&n.,2)=1 and _n_ = %eval((&n.+1)/2 + &k.) then output;
 if mod(&n.,2)=0 and _n_ in (%eval(&n./2 + &k.) %eval(&n./2 + 1 + &k.)) then output;
run;
proc sql noprint;
select mean(Sij) into :slope
 from slope;
quit;


/*求斜率的95%置信区间*/
%let alpha=0.05;
data _null_;
 set temp4;
if _n_=1 then do;
 retain Cr M1 M2;
 Cr=quantile('normal',%sysevalf(1-&alpha./2)) * sqrt(&n1.*(&n1.-1)*(2*&n1.+5)/18);
 M1=round((&n.-Cr)/2);
 M2=&n.-M1+1;
 put Cr= M1= M2=;
end;

if _n_=(M1+&k.) then call symputx('slope_lower',Sij);
if _n_=(M2+&k.) then call symputx('slope_upper',Sij);
run;






/*截距估计*/
proc sql noprint;
select median(y-&slope.*x),median(y-&slope_lower.*x),median(y-&slope_upper.*x)
  into :int,:int_lower,:int_upper
 from have;
quit;



/****************打印结果 - print result*********************/
data report;
term='intercept';estimate=&int.;  lower=&int_upper.;  upper=&int_lower.;  output;
term='slope    ';estimate=&slope.;lower=&slope_lower.;upper=&slope_upper.;output;
run;
proc report data=report nowd;
define term/' ';
define estimate/'Estimate';
define lower/"Lower %sysevalf(100*(1-&alpha.))% CI";
define upper/"Upper %sysevalf(100*(1-&alpha.))% CI";
run;


/*  Compare result between Passing-Bablok Regression and Deming regression  */
/*  https://blogs.sas.com/content/iml/2019/01/07/deming-regression-sas.html */
proc sgplot data=have noautolegend;
   scatter x=x y=y;
   lineparm x=0 y=-10.56415 slope=0.354463 / clip curvelabel='Deming' curvelabelloc=outside  lineattrs=(color=blue) curvelabelattrs=(color=blue); /* Deming regression estimates */
   lineparm x=0 y=-10.5885  slope=0.358363 / clip curvelabel='Passing-Bablok' lineattrs=(color=red) curvelabelattrs=(color=red); /* Passing-Bablok regression estimates */
   xaxis grid label="Lab Test (micrograms per deciliter)";
   yaxis grid label="New Device (kiloohms)";
run;

Ksharp_0-1643026789110.png

SGPlot2.png


 

 

Comments

Thanks for posting. I have a half-written article on Passing-Bablock regression that I have been meaning to complete for a long while. As I recall, the issue I struggled with is how to handle repeated measurements. There are subtle issues to think about.  Maybe I will be inspired to resume my study.

 

In addition to the original reference (Passing and Bablok, 1983), the documentation in the NCSS software provides a nice summary:
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Passing-Bablok_Regre...

Thank you, @Ksharp, for sharing the code and the interesting original paper. Greatly appreciated. I used Passing-Bablok regression (and also Deming regression) in an earlier job, about 15 years ago, but don't have the SAS code of that time.

 

As a quick check I've just applied the formulae from the paper to your sample data and obtained slightly different results:

 

                            Lower      Upper
              Estimate     95% CI     95% CI
  intercept   -10.5777   -13.4906   -7.73623
  slope      0.3582429  0.3254786  0.3905724

Reasons:

  1. My understanding is that the sign of infinity in the case x1 eq x2 and y1 ne y2 is important (as it affects the ranks of the Sij). There is one pair in the data where minus infinity occurs: x1=x2=76.1, y1=16.1, y2=16.6 (_id=7, 44). Your code sets Sij to 999999 in this case, regardless of the sign. In my code I used the DIVIDE function (which yields .I and .M for +/- infinity, respectively) and then an ORDER BY clause handling the special missing values properly. In your code you could write
    case when x1 eq x2 and y1 gt y2 then  999999
         when x1 eq x2 and y1 le y2 then -999999
         ...
    to take the sign into account.

  2. The exclusion of cases with Sij=-1 is prone to rounding errors due to numeric representation issues. Indeed, there are two observations in dataset TEMP2 where Sij is actually -1, but my Windows SAS 9.4M5 introduces rounding errors <1E-14 so that the equality Sij=-1 is not recognized:
     x1      y1      x2      y2
    
    76.1    16.1    78.1    14.1
    80.7    20.6    86.6    14.7
    Using
    if (x1=x2 and y1=y2) or round(Sij,1e-9)=-1 then delete;
    those two observations are eliminated, hence N=2078, not 2080.

  3. The definition of the slope estimate b depends on whether N is odd or even. So the IF conditions in the DATA step creating dataset SLOPE should read
    if mod(&n,2)=1 and ...
    if mod(&n,2)=0 and ...
    not if mod(&k,2)=...

Hi Rick,

My code is very very rough and raw . and I believed there must be many glitch in my code (especially for CI calculating ) .

And I didn't have more example to test my code.

And I am eager to look forwards to your code.

Hope you could post your sas code about Passing-Bablock regression as fast as you could.

I believe many sas users want to have it, include me .

 

Best !

Hi  ,

Firstly of all ,Thanks for your review. I appreciated . I know there should be some glitch in my code as I said to Rick.

 

Reasons:

  1. My understanding is that the sign of infinity in the case x1 eq x2 and y1 ne y2 is important (as it affects the ranks of the Sij). There is one pair in the data where minus infinity occurs: x1=x2=76.1, y1=16.1, y2=16.6 (_id=7, 44). Your code sets Sij to 999999 in this case, regardless of the sign. In my code I used the DIVIDE function (which yields .I and .M for +/- infinity, respectively) and then an ORDER BY clause handling the special missing values properly. In your code you could write
    case when x1 eq x2 and y1 gt y2 then  999999
         when x1 eq x2 and y1 le y2 then -999999
         ...
    to take the sign into account.

Yes. Yours is better . I took your advice.

Yes. All these three points are really important . Sharp eyes ! Especially the third one , I think I make a big mistake .

The code has been updated .

Hope to see your sas code for Passing-Bablok Regression.

 

Thank you very much !

Have a nice day !

 

 

 

 

 

I don't know why I would not  be able to  edit my paper . @BeverlyBrown  could help me out ? Replace the first one with this one ?

 

I would like to thank @FreelanceReinh  to find three problems in my code.

So follow @FreelanceReinh 's sharp eyes ,I changed my code and post there .

 



data have;
label x="micrograms per deciliter"
      y="kiloOhms";
input x y @@;
datalines;
169.0 45.5 130.8 33.4 109.0 23.8 94.1 19.8 86.3 20.4 78.4 18.7 
 76.1 16.1  72.2 16.7  70.0 11.9 69.8 14.6 69.5 10.6 68.7 12.7 67.3 16.9 
174.7 57.8 137.9 39.0 114.6 30.4 99.8 21.1 90.1 21.7 85.1 25.2
 80.7 20.6 78.1 19.3  77.8 20.9  76.0 18.2 77.8 18.3 74.2 15.7 73.1 13.9 
182.5 55.5 144.0 38.7 123.8 35.1 107.6 30.6 96.9 25.7 92.8 19.2 
 87.2 22.4  86.3 18.4  84.4 20.7  83.7 20.6 83.3 20.0 83.9 18.8 82.7 21.8 
160.8 49.9 122.7 32.2 102.6 19.2 86.6 14.7 76.1 16.6 69.6 18.8 
 66.7  7.4  64.4  8.2  63.0 15.5 61.7 13.7 61.2 9.2 62.4 12.0 58.4 15.2 
171.3 48.7 136.3 36.1 111.9 28.6 96.5 21.8 90.3 25.6 82.9 16.8 
 78.1 14.1  76.5 14.2  73.5 11.9 74.4 17.7 73.9 17.6 71.9 10.2 72.0 15.6 
;










/*创建一个ID变量*/
data temp;
 set have;
 _id+1;
 call symputx('n1',_id); /*样本个数*/
run;

/*计算 Cn^2 组数据的Sij*/
proc sql;
create table temp2 as
select a.x as x1,a.y as y1,b.x as x2,b.y as y2,
case 
     when x1 eq x2 and y1 gt y2 then  999999
     when x1 eq x2 and y1 le y2 then -999999
     when x1 ne x2 and y1 eq y2 then 0
	 else (y1-y2)/(x1-x2) 
end as Sij
 from temp as a,temp as b
  where a._id<b._id ;
quit;

/*Sij 排除两种情况*/
data temp3;
 set temp2;
 if (x1=x2 and y1=y2) or Sij=-1 then delete;
run;

/*Sij 排序*/
proc sort data=temp3 out=temp4;
by Sij;
run;


proc sql noprint;
/*求有效个数N*/
select count(*) into :n
 from temp4 ;

/*求偏移量K值*/
select count(*) into :k
 from temp4
  where Sij<-1;
quit;
%put &=n &=k ;

/*求斜率的点估计值b  (&slope.) */
data slope;
 set temp4;
 if mod(&n.,2)=1 and _n_ = %eval((&n.+1)/2 + &k.) then output;
 if mod(&n.,2)=0 and _n_ in (%eval(&n./2 + &k.) %eval(&n./2 + 1 + &k.)) then output;
run;
proc sql noprint;
select mean(Sij) into :slope
 from slope;
quit;


/*求斜率的95%置信区间*/
%let alpha=0.05;
data _null_;
 set temp4;
if _n_=1 then do;
 retain Cr M1 M2;
 Cr=quantile('normal',%sysevalf(1-&alpha./2)) * sqrt(&n1.*(&n1.-1)*(2*&n1.+5)/18);
 M1=round((&n.-Cr)/2);
 M2=&n.-M1+1;
 put Cr= M1= M2=;
end;

if _n_=(M1+&k.) then call symputx('slope_lower',Sij);
if _n_=(M2+&k.) then call symputx('slope_upper',Sij);
run;






/*截距估计*/
proc sql noprint;
select median(y-&slope.*x),median(y-&slope_lower.*x),median(y-&slope_upper.*x)
  into :int,:int_lower,:int_upper
 from have;
quit;



/****************打印结果 - print result*********************/
data report;
term='intercept';estimate=&int.;  lower=&int_upper.;  upper=&int_lower.;  output;
term='slope    ';estimate=&slope.;lower=&slope_lower.;upper=&slope_upper.;output;
run;
proc report data=report nowd;
define term/' ';
define estimate/'Estimate';
define lower/"Lower %sysevalf(100*(1-&alpha.))% CI";
define upper/"Upper %sysevalf(100*(1-&alpha.))% CI";
run;


/*  Compare result between Passing-Bablok Regression and Deming regression  */
/*  https://blogs.sas.com/content/iml/2019/01/07/deming-regression-sas.html */
proc sgplot data=have noautolegend;
   scatter x=x y=y;
   lineparm x=0 y=-10.56415 slope=0.354463 / clip curvelabel='Deming' curvelabelloc=outside  lineattrs=(color=blue) curvelabelattrs=(color=blue); /* Deming regression estimates */
   lineparm x=0 y=-10.5885  slope=0.358363 / clip curvelabel='Passing-Bablok' lineattrs=(color=red) curvelabelattrs=(color=red); /* Passing-Bablok regression estimates */
   xaxis grid label="Lab Test (micrograms per deciliter)";
   yaxis grid label="New Device (kiloohms)";
run;

Ksharp_0-1643026555171.png

SGPlot2.png

Hello,

I read your code and found it enlightening. And verified it on Medcalc with your data, but found that the results were slightly different.
But in my ability, I can't tell why the difference is happening, so here's a tip for you to continue your research and look forward to some more research on the code.

soulmate_0-1643014822196.png

 

Hello @soulmate and welcome to the SAS Support Communities!

 

Thanks for bringing MedCalc into play, great idea. My old MedCalc version 9.6.2.0 from 2008 confirms your regression coefficients, but yields different confidence intervals:

Passing_Bablok.jpg

Interestingly, these confidence limits are closer to those obtained with SAS.

 

So we see an example of the well-known fact that even different versions of the same software may produce different results from identical input data.

 

The current online documentation https://www.medcalc.org/manual/passing-bablok-regression.php mentions a couple of more recent references in addition to the original paper, but doesn't contain any formula for computing the estimates. Further investigation would be needed to find out where the differences come from.

@FreelanceReinh 

I found if I used

 if (x1=x2 and y1=y2) or Sij=-1 then delete;

would get exactly the result with yours , Surprise ?

Ksharp_0-1643025520910.png

 

So I am thinking whether to use my original code or yours ROUND() function . What your opinion ?

 

About confident interval , I think it is about the way to calculate , just like @Rick_SAS 's blog mentioned :

https://blogs.sas.com/content/iml/2019/01/07/deming-regression-sas.html

You might wonder how accurate the parameter estimates are. Linnet (1993) recommends using the jackknife method to answer that question. I have previously explained how to jackknife estimates in SAS/IML, and the following program is copied from that article:

And comments from MVT

Ksharp_1-1643026006579.png

If CI is calculated by Bootstrap or Jackknife method, and the result might totally different from my code .

 

 

 

 

That's interesting, @Ksharp, thank you. So part of the mystery is cleared up: Your result suggests that MedCalc 9.6.2.0 suffers from the same rounding errors (due to numeric representation issues) as any other software does if no countermeasures (such as proper rounding or a fuzzy comparison) are taken. My opinion is that those countermeasures should be taken because this is the only way to obtain results which are consistent

  • with results obtained on other hardware/software platforms
  • with results from hand calculation
  • with results using other measurement units.

The latter point can be demonstrated easily by multiplying all values in your sample data by 10 (i.e., removing the decimal points). The calculation using the resulting integer values is no longer affected by numeric representation errors. Here's the result for the modified data from MedCalc 9.6.2.0:

Passing_Bablok2.jpg

With the expected factor 10 in the intercept and its CI this result matches that obtained by SAS using the ROUND function appropriately -- as it should, because using a different measurement unit must not affect the method comparison.

Thanks for your discussion of this issue. @Ksharp,@FreelanceReinhard 

I used MedCalc version 20.026.This is a relatively new software version, and I can understand that @FreelanceReinhard your software version has not yet taken into account the estimation method of the confidence interval.In fact, medcalc already takes into account algorithm updates for confidence intervals.

For confidence interval estimation for PB regression, the Bootstrap or Jackknife method is necessary and scientific.

So, I would like to suggest to you, can you use these two algorithms to optimize your code? @KsharpThis is very important for practitioners in the IVD industry or in the field of medical testing.

As IVD practitioners, and SAS users, it is very desirable to integrate these algorithms on THE SAS without the need to use other statistical software.I wonder what the advice of you two are?

The figure below is medcalc default replocations and random-number seed.

soulmate_0-1643079219993.png

 

OK. Here is the code for Bootstrap confident interval . They are very close , but not the same .

Maybe it is due to the algorithm of random sampling  ,event the seed is the same. There are a lot of unsure reasons.

Or maybe as @Rick_SAS  pointed out :

 

.As I recall, the issue I struggled with is how to handle repeated measurements. 
There are subtle issues to think about.

 

 

/*Firstly Make a macro for Passing-Bablok Regression*/
%macro Passing_Bablok_regression(dataset= ,x= ,y= ,SampleID=  );
/*创建一个ID变量*/
data temp(rename=(&x.=x &y.=y));
 set &dataset.;
 _id+1;
 call symputx('n1',_id); /*样本个数*/
run;

/*计算 Cn^2 组数据的Sij*/
proc sql;
create table temp2 as
select a.x as x1,a.y as y1,b.x as x2,b.y as y2,
case when x1 eq x2 and y1 gt y2 then  999999
     when x1 eq x2 and y1 le y2 then -999999
     when x1 ne x2 and y1 eq y2 then 0
	 else (y1-y2)/(x1-x2) 
end as Sij
 from temp as a,temp as b
  where a._id<b._id ;
quit;

/*Sij 排除两种情况*/
data temp3;
 set temp2;
 if (x1=x2 and y1=y2) or Sij=-1 then delete;
run;

/*Sij 排序*/
proc sort data=temp3 out=temp4;
by Sij;
run;

proc sql noprint;
/*求有效个数N*/
select count(*) into :n
 from temp4 ;
/*求偏移量K值*/
select count(*) into :k
 from temp4
  where Sij<-1;
quit;
%put &=n &=k ;

/*求斜率的点估计值b  (&slope.) */
data slope;
 set temp4;
 if mod(&n,2)=1 and _n_ = %eval((&n+1)/2 + &k) then output;
 if mod(&n,2)=0 and _n_ in (%eval(&n/2 + &k) %eval(&n/2 + 1 + &k)) then output;
run;
proc sql noprint;
select mean(Sij) into :slope
 from slope;
quit;

/*求斜率的95%置信区间*/
%local alpha;
%let alpha=0.05;
data _null_;
 set temp4;
if _n_=1 then do;
 retain Cr M1 M2;
 Cr=quantile('normal',%sysevalf(1-&alpha./2)) * sqrt(&n1.*(&n1.-1)*(2*&n1.+5)/18);
 M1=round((&n.-Cr)/2);
 M2=&n.-M1+1;
 put Cr= M1= M2=;
end;
if _n_=(M1+&k.) then call symputx('slope_lower',Sij);
if _n_=(M2+&k.) then call symputx('slope_upper',Sij);
run;

/*截距估计*/
proc sql noprint;
select median(y-&slope.*x),median(y-&slope_lower.*x),median(y-&slope_upper.*x)
  into :int,:int_lower,:int_upper
 from temp ;
quit;


/****************Save result*********************/
data report;
SampleID=&SampleID.;
term='intercept';estimate=&int.;  lower=&int_upper.;  upper=&int_lower.;  output;
term='slope    ';estimate=&slope.;lower=&slope_lower.;upper=&slope_upper.;output;
label estimate='Estimate' lower="Lower %sysevalf(100*(1-&alpha.))% CI" upper="Upper %sysevalf(100*(1-&alpha.))% CI";
run;

proc append base=output_pb data=report force;run;
%mend;















data have;
label x="micrograms per deciliter"
      y="kiloOhms";
input x y @@;
datalines;
169.0 45.5 130.8 33.4 109.0 23.8 94.1 19.8 86.3 20.4 78.4 18.7 
 76.1 16.1  72.2 16.7  70.0 11.9 69.8 14.6 69.5 10.6 68.7 12.7 67.3 16.9 
174.7 57.8 137.9 39.0 114.6 30.4 99.8 21.1 90.1 21.7 85.1 25.2
 80.7 20.6 78.1 19.3  77.8 20.9  76.0 18.2 77.8 18.3 74.2 15.7 73.1 13.9 
182.5 55.5 144.0 38.7 123.8 35.1 107.6 30.6 96.9 25.7 92.8 19.2 
 87.2 22.4  86.3 18.4  84.4 20.7  83.7 20.6 83.3 20.0 83.9 18.8 82.7 21.8 
160.8 49.9 122.7 32.2 102.6 19.2 86.6 14.7 76.1 16.6 69.6 18.8 
 66.7  7.4  64.4  8.2  63.0 15.5 61.7 13.7 61.2 9.2 62.4 12.0 58.4 15.2 
171.3 48.7 136.3 36.1 111.9 28.6 96.5 21.8 90.3 25.6 82.9 16.8 
 78.1 14.1  76.5 14.2  73.5 11.9 74.4 17.7 73.9 17.6 71.9 10.2 72.0 15.6 
;





/* 1. Get point estimate of intercept and slope */
proc delete data=Output_pb ;run;
%Passing_Bablok_regression(dataset=have, x=x,y=y ,SampleID=0)
data point_estimate;
 set Output_pb;
 keep term estimate;
run;



/*  The following code for Bootstrap Confidence Interval  is from Rick's blog
https://blogs.sas.com/content/iml/2016/08/10/bootstrap-confidence-interval-sas.html
https://blogs.sas.com/content/iml/2018/06/20/bootstrap-method-example-sas.html      */

%let NumSamples = 1000;       /* number of bootstrap resamples */
/* 2. Generate many bootstrap samples */
proc surveyselect data=have NOPRINT seed=978 
     out=BootSSFreq(rename=(Replicate=SampleID))
     method=urs              /* resample with replacement */
     samprate=1              /* each bootstrap sample has N observations */
     OUTHITS                 /* option to suppress the frequency var */
     reps=&NumSamples;       /* generate NumSamples bootstrap resamples */
run;



/* 3. Compute the statistic for each bootstrap sample */
proc freq data=BootSSFreq noprint;
table SampleID/out=SampleID;
run;
proc delete data=Output_pb ;run;
data _null_;
 set SampleID;
 call execute(catt('%nrstr(%Passing_Bablok_regression(dataset=BootSSFreq(where=(SampleID=',SampleID,')) ,x=x ,y=y ,SampleID=',SampleID,'))'));
run;



/* 4. Write stats to BootStats data set */ 
proc univariate data=Output_pb noprint ; /* use NOPRINT option to suppress output and graphs */
   class  term;
   var estimate ;
   output out=Pctl pctlpre =CI95_
          pctlpts =2.5  97.5       /* compute 95% bootstrap confidence interval */
          pctlname=Lower Upper;
run;
 

/*Merge the point estimate and Confident Interval together*/
data want;
 merge point_estimate pctl;
 by term;
 label term='09'x estimate='Estimate' CI95_Lower='Lower 95%CI' CI95_Upper='Upper 95%CI';
run;
/*Print resutl*/
proc report data=want nowd; run;

 

Ksharp_0-1643114481401.png

 

 

 

Thanks for your reply. @Ksharp

To verify the accuracy of the code more comprehensively, I used a different dataset to verify it.

But different results have been obtained. The point estimates for slope and intercept have been slightly different.

In my opinion, the point estimate does not involve a random sampling algorithm and should be the same, so is there any factor that is not reflected in the code?

SAS result:

soulmate_2-1643189617194.png

MedCalc result:

soulmate_1-1643189099102.png

My data:


data have;
label x="micrograms per deciliter"
y="kiloOhms";
input x y @@;
datalines;
0.07 0.07
0.04 0.10
0.07 0.08
0.05 0.10
0.06 0.08
0.06 0.07
0.03 0.05
0.04 0.05
0.05 0.08
0.05 0.10
0.12 0.21
0.10 0.17
0.03 0.11
0.18 0.24
0.29 0.33
0.05 0.05
0.04 0.05
0.11 0.21
0.38 0.36
0.04 0.05
0.03 0.06
0.06 0.07
0.46 0.27
0.08 0.05
0.02 0.09
0.12 0.23
0.16 0.25
0.07 0.10
0.07 0.07
0.04 0.06
0.11 0.23
0.23 0.21
0.15 0.20
0.05 0.07
0.04 0.14
0.05 0.11
0.17 0.26
8.43 8.12
1.86 1.47
0.17 0.27
8.29 7.71
5.98 5.26
0.57 0.43
0.07 0.10
0.06 0.11
0.21 0.24
8.29 7.84
0.28 0.23
8.25 7.79
0.18 0.25
0.17 0.22
0.15 0.21
8.33 7.67
5.58 5.05
0.09 0.14
1.00 3.72
0.09 0.13
3.88 3.54
0.51 0.42
4.09 3.64
0.89 0.71
5.61 5.18
4.52 4.15
0.09 0.12
0.05 0.05
0.05 0.05
0.04 0.07
0.05 0.07
0.09 0.11
0.03 0.05
2.27 2.08
1.50 1.21
5.05 4.46
0.22 0.25
2.13 1.93
0.05 0.08
4.09 3.61
1.46 1.13
1.20 0.97
0.02 0.05
1.00 1.00
3.39 3.11
1.00 0.05
2.07 1.83
6.68 6.06
3.00 2.97
0.06 0.09
7.17 6.55
1.00 1.00
2.00 0.05
2.91 2.45
3.92 3.36
1.00 1.00
7.20 6.88
6.42 5.83
2.38 2.04
1.97 1.76
4.72 4.37
1.64 1.25
5.48 4.77
5.54 5.16
0.02 0.06
;
run;

@soulmate I do not have access to MedCalc, but I would be curious to know what answer it gives for each of the following sets of data:

 

/* only 10 obs */
data Test1;
input x y @@;
datalines;
169.0 155.6 130.8 123.1 109.0  96.5  94.1  85.7 86.3  87.5 78.4 82.5 
 76.1  75.6  72.2  77.2  70.0  64.3  69.8  72.3 
;

/* repeated data values */
data Test2;
input x y @@;
datalines;
1  9  1 10  1.5 12  2 15  2 17  3 19  3 17
;

Rick,

I will do it for you .

For the test1 dataset:  -- without Bootstrap CI

 

Ksharp_0-1643200575650.png

-- with Bootstrap CI

Ksharp_3-1643200977341.png

 

 

 

 

 

For the Test2 dataset: -- without Bootstrap CI

Ksharp_1-1643200769307.png

-- with Bootstrap CI

Ksharp_2-1643200851171.png

 

I don't know what to say. I tested another dataset ,and get the exactly the same result.

I guess there are must have some underline adjustment in Medcalc,especially when there are more data ?

 

data have;
set sashelp.class(keep=weight height rename=(weight=x height=y));
run;

Ksharp_0-1643204323492.png

 

Ksharp_1-1643204349563.png

 

Thanks for your reply and discussion, after looking at the data today and verifying it with another set of data, I have the following ideas:

1.In method comparison, x and y are usually not much different values and are often used for equivalence analysis.For example, the concentration of glucose in serum is tested with 2 IVD products, so that x and y are very close.

2.The data you used to test your code before, x and y, are very different, which in my opinion is already a significant inconsistency

3.So, this could be the reason why the results are problematic? Maybe MedCalc has its own way of handling similar data?

4.I think (of course, my experience is not enough), the main application area of PB regression is the analysis of laboratory test results,which is regression analysis between two similar data. So this problem is still very important and needs to be solved.

5.Another way of thinking, MedCalc has a built-in way of processing very small data, because many of the previous sets of data are 0.0X, or maybe this causes a slight deviation in the results?

6.I used another set of real data to verify, and the results are as follows(with Bootstrap CI😞

MedCalc:

soulmate_0-1643246882616.png

SAS:

soulmate_1-1643247312521.png

 

data:

 

data have;
label x="micrograms per deciliter"
y="kiloOhms";
input x y @@;
datalines;
3.28 3.42
1.91 2.22
15.93 17.45
16.72 17.16
13.83 14.38
1.81 1.94
9.61 8.43
16.06 17.69
16.46 18.10
5.75 7.53
0.053 0.06
17.75 20.28
13.72 15.30
0.857 1.01
14.63 16.08
16.23 15.51
4.42 4.87
2.02 2.33
17.74 19.18
4.12 3.12
15.90 17.16
4.89 5.36
1.65 1.56
21.37 22.71
0.296 0.33
2.66 2.68
11.46 12.07
18.16 16.88
1.85 1.73
11.83 13.61
0.998 1.24
3.80 4.22
6.79 7.85
15.37 18.06
1.37 1.48
8.11 9.88
0.145 0.16
1.63 1.61
6.15 5.88
5.80 5.76
2.41 2.39
17.03 17.32
0.343 0.40
3.75 3.89
8.05 8.95
7.38 6.82
9.76 11.29
3.43 4.24
7.07 8.64
8.46 8.97
6.25 7.24
1.66 1.36
3.98 4.21
9.11 8.08
10.83 11.47
4.66 4.43
8.90 7.89
15.43 15.60
2.87 2.33
0.032 0.03
19.80 21.17
14.92 15.83
6.14 6.68
3.90 4.20
0.132 0.14
0.824 0.92
4.09 4.63
3.22 3.39
7.56 8.13
5.67 6.96
1.53 1.07
5.21 6.58
9.66 8.57
6.48 6.44
1.99 2.29
21.96 22.77
18.08 18.45
11.89 13.24
11.65 11.97
15.02 15.71
0.440 0.42
19.12 20.80
7.13 7.17
16.15 17.54
15.91 17.05
17.76 18.62
11.81 12.12
5.34 5.00
6.19 7.55
5.59 5.66
9.39 9.35
9.45 9.24
6.14 6.67
4.30 4.83
11.46 10.35
3.99 4.34
3.51 3.53
3.33 3.18
9.94 8.58
12.77 14.25
1.56 1.61
11.10 10.56
2.20 2.60
6.12 6.64
8.90 9.79
12.46 14.08
10.05 10.10
0.053 0.05
4.22 3.89
5.76 6.78
1.54 1.06
13.68 14.65
3.42 3.44
2.03 2.46
9.91 12.66
0.832 0.93
2.40 1.88
4.99 6.20
5.90 6.69
0.527 0.59
0.404 0.51
0.450 0.41
0.146 0.15
0.040 0.04
1.23 1.43
5.68 6.31
0.231 0.27
0.254 0.26
0.346 0.38
0.239 0.27
0.311 0.27
3.30 3.22
14.22 13.28
14.18 15.75
0.621 0.53
11.75 12.65
10.32 9.34
15.48 16.43
8.93 7.86
4.05 4.23
11.18 12.65
14.60 15.85
3.76 3.23
9.55 8.99
12.06 11.54
2.00 1.69
11.88 12.07
7.17 7.53
14.54 14.75
5.47 6.34
5.70 5.03
9.69 9.43
0.034 0.03
9.50 9.03
10.35 9.54
0.468 0.44
14.69 15.86
2.57 2.15
1.29 1.22
2.12 1.98
0.061 0.06
12.52 14.54
10.29 12.04
3.42 2.99
1.94 2.04
1.06 0.98
4.61 4.23
3.49 3.22
12.52 14.35
1.29 1.32
6.18 7.12
;
run;

There is also a small question, why is the same 1000 sampling times, SAS running much slower than MedCalc?

Maybe I found the reason, there are three outliers in data . Medcalc might have skill to handle this situation .

 

Ksharp_0-1643284788848.png

 

 

If I delete these outliers .
" if _n_ in (56 83 90) then delete; "
I would get exactly the same result .

Ksharp_1-1643284823656.png

Ksharp_3-1643284852494.png

 

@Rick_SAS  , Do you know how to handle outliers in this situation as Medcalc?

 

"There is also a small question, why is the same 1000 sampling times, SAS running much slower than MedCalc?"

Because I used datastep. If using IML as @Rick_SAS  did ,that much be faster .

Thanks to your reply. I learned a lot from our disscution.

I have posted a SAS/IML program that performs Passing-Bablok regression and estimates CIs by using the method in the PB paper: Also, an overview and brief discussion of the PB algorithm.

See "Passing-Bablok regression in SAS."

 

I don't want to hijack this thread, but it is not easy to post code and screenshots in a blog comment, so if you have any comments that involve posting code, feel free to post your feedback here.

@Ksharp 

hi Ksharp,long time no see.

I ran the pb regression program again in the past few days and found a problem.

The temp3 dataset needs to delete the sij=-1 observation, but I opened the dataset and still saw the observation with sij=-1.See figure below.

soulmate_0-1652938640340.png

Then I ran the program to see the number of observations sij=-1 in the temp2 and temp3 datasets, and the results were strange.The results show that there is data for sij=-1 in temp2, but no data for sij=-1 in temp3.See figure below.

soulmate_1-1652938909153.png

But this is obviously inconsistent with the dataset results, which should have an impact on the results of the program, right? Why is this happening?

@KsharpOne more question. When solving for Cr, why is sqrt followed by "&n1." instead of "&n.".

soulmate_0-1652941707124.png

 

soulmate,

Long time no see.

This problem has already mentioned by master @FreelanceReinh  . I think it ROUND problem.

I didn't follow @FreelanceReinh 's advice due to comply with software MedCalc . If I used ROUND() ,I would get different result with MedCalc , If I delete ROUND() ,I would get exact result with MedCalc .

 

Ksharp_0-1652953897506.png

 

For your second question, &n1. refer to the number of sample, while &n. refer to the VALID number of sample. I think the original paper using the number of sample.

@soulmate 

I used your IVD glucose data in your 1/24/22 post with JMP's Passing-Bablok regression tool and it looks like it matches what you're seeing in MedCalc?

Screen Shot 2022-11-01 at 10.28.32 AM.png

Version history
Last update:
‎01-24-2022 07:20 AM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags