BookmarkSubscribeRSS Feed
Top_Katz
Quartz | Level 8

Hi!  This is kind of a statistical theory question.  I am computing confidence intervals for Somers' D in PROC FREQ.  The asymptotic variance formula for Somers' D depends on the number of observations and the number of events.  Does the sample size required to achieve the typical 95% confidence with 80% power depend on the number of events, or just the number of observations?  In my case, I have thousands of observations, so that's no problem, but I may have as few as 25 events.  The CIs in those cases are super-wide anyway, but I'm trying to get a sense of when I can rely on the results I'm getting.  Thanks!

4 REPLIES 4
sbxkoenk
SAS Super FREQ

I cannot answer that from the top of my head.

 

But you can always try to do it with "brute force"

(instead of using an elegant formula -- that makes assumptions and gives asymptotic results) :

 

Compute a bootstrap confidence interval in SAS
By Rick Wicklin on The DO Loop August 10, 2016
https://blogs.sas.com/content/iml/2016/08/10/bootstrap-confidence-interval-sas.html

 

https://blogs.sas.com/content/tag/bootstrap-and-resampling/

 

Koen

 
Top_Katz
Quartz | Level 8

Hi @sbxkoenk !  Thank you for responding.  Computing a bootstrap CI could be good confirmative information, but it doesn't solve the sample size sufficiency issue, does it?  Doesn't a bootstrap still require a certain number of observations to be reliable?  I think I would still need to know how the number of events affects the reliability, if at all.

Rick_SAS
SAS Super FREQ

Yes, in general, confidence intervals that are associated with a binomial proportion are affected by the proportion parameter. For the case of Somer's D, notice that the estimate (see the documentation) looks like

D = (P-Q)/w_r.

If you look at the formula for the asymptotic standard error and expand the quadratic term, you will see a term that you can rewrite as D^2. Since D depends on the binomial probability, so does the standard error.

 

Top_Katz
Quartz | Level 8

Hi @Rick_SAS!

 

Thank you for responding.  I can see your point about the formula, but I still don't have a good intuitive feel for how the event frequency will affect the ASE and CI size, nor how reliable the ASE and CI are for very low event counts. 

 

I ran some code which I have copied into this message below (I can't upload files, sorry).  It does some testing with 10,000 observations (actually 10,001), one set of "predictions" (ordervar) and one or two events scattered in different "dependent variables" (the s_1_* and s_2_* variables). 

 

For each of the single events, placed at one end or the other end (Somers' D +/- 1) at Q1 or Q3 (SD +/- 0.5) or at the median (SD 0), the confidence intervals are very tight. 

 

But if you drop in a second event the Somers' D can change drastically and the CI can blow wide open.  So the low event count results are not very stable and don't seem trustworthy to me. 

 

I'm wondering whether there is any published guidance on how many events are needed to stabilize the results (like a jackknife test, so that adding or removing one event doesn't completely change the picture).

 

SELF-CONTAINED CODE:

 

%let dsn = 02 ;

%let loval&dsn. = 0 ;
%let hival&dsn. = 10000 ;
%let med&dsn. = %sysfunc(floor(%sysevalf((&&hival&dsn.. - &&loval&dsn..) / 2))) ;
%let medm1&dsn. = %sysevalf(&&med&dsn.. - 1) ;
%let medp1&dsn. = %sysevalf(&&med&dsn.. + 1) ;
%let q1&dsn. = %sysfunc(floor(%sysevalf(&&loval&dsn. + ((&&hival&dsn.. - &&loval&dsn..) / 4)))) ;
%let q3&dsn. = %sysfunc(floor(%sysevalf(&&hival&dsn. - ((&&hival&dsn.. - &&loval&dsn..) / 4)))) ;
%let him1&dsn. = %sysevalf(&&hival&dsn.. - 1) ;
%let lop1&dsn. = %sysevalf(&&loval&dsn.. + 1) ;
%put med&dsn. = &&med&dsn.. ;
%put medm1&dsn. = &&medm1&dsn.. ;
%put medp1&dsn. = &&medp1&dsn.. ;
%put q1&dsn. = &&q1&dsn.. ;
%put q3&dsn. = &&q3&dsn.. ;
%put him1&dsn. = &&him1&dsn.. ;
%put lop1&dsn. = &&lop1&dsn.. ;


/**/
data test_smdcr_&dsn. ;
keep ordervar s_0 s_1_l s_1_q1 s_1_m s_1_q3 s_1_h
s_2_l_p1 s_2_l_q1 s_2_l_m s_2_l_q3 s_2_l_h
s_2_q1_m s_2_q1_q3 s_2_q1_h
s_2_m_q3 s_2_m_h s_2_q3_h s_2_m1_h
s_2_m1_p1
;
do ordervar = &&loval&dsn.. to &&hival&dsn.. ;
s_0 = 0 ;
s_1_l = 0 ;
s_1_q1 = 0 ;
s_1_m = 0 ;
s_1_q3 = 0 ;
s_1_h = 0 ;
s_2_l_p1 = 0 ;
s_2_l_q1 = 0 ;
s_2_l_m = 0 ;
s_2_l_q3 = 0 ;
s_2_l_h = 0 ;
s_2_q1_m = 0 ;
s_2_q1_q3 = 0 ;
s_2_q1_h = 0 ;
s_2_m_q3 = 0 ;
s_2_m_h = 0 ;
s_2_q3_h = 0 ;
s_2_m1_h = 0 ;
s_2_m1_p1 = 0 ;
if (ordervar = &&loval&dsn..) then do ;
s_1_l = 1 ;
s_2_l_p1 = 1 ;
s_2_l_q1 = 1 ;
s_2_l_m = 1 ;
s_2_l_q3 = 1 ;
s_2_l_h = 1 ;
end ;
else if (ordervar = &&lop1&dsn..) then do ;
s_2_l_p1 = 1 ;
end ;
else if (ordervar = &&q1&dsn..) then do ;
s_1_q1 = 1 ;
s_2_l_q1 = 1 ;
s_2_q1_m = 1 ;
s_2_q1_q3 = 1 ;
s_2_q1_h = 1 ;
end ;
else if (ordervar = &&medm1&dsn..) then do ;
s_2_m1_p1 = 1 ;
end ;
else if (ordervar = &&med&dsn..) then do ;
s_1_m = 1 ;
s_2_l_m = 1 ;
s_2_q1_m = 1 ;
s_2_m_q3 = 1 ;
s_2_m_h = 1 ;
end ;
else if (ordervar = &&medp1&dsn..) then do ;
s_2_m1_p1 = 1 ;
end ;
else if (ordervar = &&q3&dsn..) then do ;
s_1_q3 = 1 ;
s_2_l_q3 = 1 ;
s_2_q1_q3 = 1 ;
s_2_m_q3 = 1 ;
s_2_q3_h = 1 ;
end ;
else if (ordervar = &&him1&dsn..) then do ;
s_2_m1_h = 1 ;
end ;
else if (ordervar = &&hival&dsn..) then do ;
s_1_h = 1 ;
s_2_m1_h = 1 ;
s_2_l_h = 1 ;
s_2_q1_h = 1 ;
s_2_m_h = 1 ;
s_2_q3_h = 1 ;
end ;
output ;
end ;
run ;
/**/


title2 "proc freq data = test_smdcr_&dsn. s_*ordervar cl" ;
proc freq data = test_smdcr_&dsn. ;
tables (s_0 s_1_l s_1_q1 s_1_m s_1_q3 s_1_h
s_2_l_p1 s_2_l_q1 s_2_l_m s_2_l_q3 s_2_l_h
s_2_q1_m s_2_q1_q3 s_2_q1_h
s_2_m_q3 s_2_m_h s_2_q3_h s_2_m1_h
s_2_m1_p1) * ordervar / measures cl noprint ;
test smdcr ;
output smdcr out = smdcr_test_&dsn. ;
;
run ;
title2 ;

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 509 views
  • 3 likes
  • 3 in conversation