Re: PROC FREQ sample and event sizes needed to apply asymptotic confi...

Top_Katz · Posted 05-26-2023 01:35 PM

Hi! This is kind of a statistical theory question. I am computing confidence intervals for Somers' D in PROC FREQ. The asymptotic variance formula for Somers' D depends on the number of observations and the number of events. Does the sample size required to achieve the typical 95% confidence with 80% power depend on the number of events, or just the number of observations? In my case, I have thousands of observations, so that's no problem, but I may have as few as 25 events. The CIs in those cases are super-wide anyway, but I'm trying to get a sense of when I can rely on the results I'm getting. Thanks!

sbxkoenk · Posted 05-26-2023 02:41 PM

I cannot answer that from the top of my head.

But you can always try to do it with "brute force"

(instead of using an elegant formula -- that makes assumptions and gives asymptotic results) :

Compute a bootstrap confidence interval in SAS
By Rick Wicklin on The DO Loop August 10, 2016
https://blogs.sas.com/content/iml/2016/08/10/bootstrap-confidence-interval-sas.html

https://blogs.sas.com/content/tag/bootstrap-and-resampling/

Koen

Top_Katz · Posted 05-26-2023 03:06 PM

Hi @sbxkoenk ! Thank you for responding. Computing a bootstrap CI could be good confirmative information, but it doesn't solve the sample size sufficiency issue, does it? Doesn't a bootstrap still require a certain number of observations to be reliable? I think I would still need to know how the number of events affects the reliability, if at all.

Rick_SAS · Posted 05-26-2023 03:26 PM

Yes, in general, confidence intervals that are associated with a binomial proportion are affected by the proportion parameter. For the case of Somer's D, notice that the estimate (see the documentation) looks like

D = (P-Q)/w_r.

If you look at the formula for the asymptotic standard error and expand the quadratic term, you will see a term that you can rewrite as D^2. Since D depends on the binomial probability, so does the standard error.

Top_Katz · Posted 06-06-2023 10:57 AM

Hi @Rick_SAS!

Thank you for responding. I can see your point about the formula, but I still don't have a good intuitive feel for how the event frequency will affect the ASE and CI size, nor how reliable the ASE and CI are for very low event counts.

I ran some code which I have copied into this message below (I can't upload files, sorry). It does some testing with 10,000 observations (actually 10,001), one set of "predictions" (ordervar) and one or two events scattered in different "dependent variables" (the s_1_* and s_2_* variables).

For each of the single events, placed at one end or the other end (Somers' D +/- 1) at Q1 or Q3 (SD +/- 0.5) or at the median (SD 0), the confidence intervals are very tight.

But if you drop in a second event the Somers' D can change drastically and the CI can blow wide open. So the low event count results are not very stable and don't seem trustworthy to me.

I'm wondering whether there is any published guidance on how many events are needed to stabilize the results (like a jackknife test, so that adding or removing one event doesn't completely change the picture).

SELF-CONTAINED CODE:

%let dsn = 02 ;

%let loval&dsn. = 0 ;
%let hival&dsn. = 10000 ;
%let med&dsn. = %sysfunc(floor(%sysevalf((&&hival&dsn.. - &&loval&dsn..) / 2))) ;
%let medm1&dsn. = %sysevalf(&&med&dsn.. - 1) ;
%let medp1&dsn. = %sysevalf(&&med&dsn.. + 1) ;
%let q1&dsn. = %sysfunc(floor(%sysevalf(&&loval&dsn. + ((&&hival&dsn.. - &&loval&dsn..) / 4)))) ;
%let q3&dsn. = %sysfunc(floor(%sysevalf(&&hival&dsn. - ((&&hival&dsn.. - &&loval&dsn..) / 4)))) ;
%let him1&dsn. = %sysevalf(&&hival&dsn.. - 1) ;
%let lop1&dsn. = %sysevalf(&&loval&dsn.. + 1) ;
%put med&dsn. = &&med&dsn.. ;
%put medm1&dsn. = &&medm1&dsn.. ;
%put medp1&dsn. = &&medp1&dsn.. ;
%put q1&dsn. = &&q1&dsn.. ;
%put q3&dsn. = &&q3&dsn.. ;
%put him1&dsn. = &&him1&dsn.. ;
%put lop1&dsn. = &&lop1&dsn.. ;

/**/
data test_smdcr_&dsn. ;
keep ordervar s_0 s_1_l s_1_q1 s_1_m s_1_q3 s_1_h
s_2_l_p1 s_2_l_q1 s_2_l_m s_2_l_q3 s_2_l_h
s_2_q1_m s_2_q1_q3 s_2_q1_h
s_2_m_q3 s_2_m_h s_2_q3_h s_2_m1_h
s_2_m1_p1
;
do ordervar = &&loval&dsn.. to &&hival&dsn.. ;
s_0 = 0 ;
s_1_l = 0 ;
s_1_q1 = 0 ;
s_1_m = 0 ;
s_1_q3 = 0 ;
s_1_h = 0 ;
s_2_l_p1 = 0 ;
s_2_l_q1 = 0 ;
s_2_l_m = 0 ;
s_2_l_q3 = 0 ;
s_2_l_h = 0 ;
s_2_q1_m = 0 ;
s_2_q1_q3 = 0 ;
s_2_q1_h = 0 ;
s_2_m_q3 = 0 ;
s_2_m_h = 0 ;
s_2_q3_h = 0 ;
s_2_m1_h = 0 ;
s_2_m1_p1 = 0 ;
if (ordervar = &&loval&dsn..) then do ;
s_1_l = 1 ;
s_2_l_p1 = 1 ;
s_2_l_q1 = 1 ;
s_2_l_m = 1 ;
s_2_l_q3 = 1 ;
s_2_l_h = 1 ;
end ;
else if (ordervar = &&lop1&dsn..) then do ;
s_2_l_p1 = 1 ;
end ;
else if (ordervar = &&q1&dsn..) then do ;
s_1_q1 = 1 ;
s_2_l_q1 = 1 ;
s_2_q1_m = 1 ;
s_2_q1_q3 = 1 ;
s_2_q1_h = 1 ;
end ;
else if (ordervar = &&medm1&dsn..) then do ;
s_2_m1_p1 = 1 ;
end ;
else if (ordervar = &&med&dsn..) then do ;
s_1_m = 1 ;
s_2_l_m = 1 ;
s_2_q1_m = 1 ;
s_2_m_q3 = 1 ;
s_2_m_h = 1 ;
end ;
else if (ordervar = &&medp1&dsn..) then do ;
s_2_m1_p1 = 1 ;
end ;
else if (ordervar = &&q3&dsn..) then do ;
s_1_q3 = 1 ;
s_2_l_q3 = 1 ;
s_2_q1_q3 = 1 ;
s_2_m_q3 = 1 ;
s_2_q3_h = 1 ;
end ;
else if (ordervar = &&him1&dsn..) then do ;
s_2_m1_h = 1 ;
end ;
else if (ordervar = &&hival&dsn..) then do ;
s_1_h = 1 ;
s_2_m1_h = 1 ;
s_2_l_h = 1 ;
s_2_q1_h = 1 ;
s_2_m_h = 1 ;
s_2_q3_h = 1 ;
end ;
output ;
end ;
run ;
/**/

title2 "proc freq data = test_smdcr_&dsn. s_*ordervar cl" ;
proc freq data = test_smdcr_&dsn. ;
tables (s_0 s_1_l s_1_q1 s_1_m s_1_q3 s_1_h
s_2_l_p1 s_2_l_q1 s_2_l_m s_2_l_q3 s_2_l_h
s_2_q1_m s_2_q1_q3 s_2_q1_h
s_2_m_q3 s_2_m_h s_2_q3_h s_2_m1_h
s_2_m1_p1) * ordervar / measures cl noprint ;
test smdcr ;
output smdcr out = smdcr_test_&dsn. ;
;
run ;
title2 ;

PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

Re: PROC FREQ sample and event sizes needed to apply asymptotic confidence intervals for Somers' D

SAS Innovate 2025: Call for Content