BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gsk
Obsidian | Level 7 gsk
Obsidian | Level 7

I have data with some categorical variables with missing values in them. I need variance estimates of the data based on leave i-th observation out within each group inside a categorical variable. (So in the end, I would have the number of estimates same as the number of observations in each group and k variance estimates for k groups in a categorical variable). 

 

Is using ROC SURVEYFREQ with strata statement and VARMETHOD=JACKKNIFE specification a correct way to do this? Or do I need to make a loop like this: https://blogs.sas.com/content/iml/2017/06/21/jackknife-estimate-standard-error-sas.html

 

Also, if I need to make a loop like that for k groups, how do we construct proc iml codes? 

 

Thank you in advance! I would greatly appreciate any help. 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I don't fully understand what you are saying, but I guess I don't have to. Your question seems to be "how can I compute k variances in SAS/IML where each one is computed by leaving out one of the k groups." 

 

Attached is a program that I hope will demonstrate the programming technique, even if I am misinterpreting some of the details. For background, I suggest you read my article about the "UNIQUE-LOC" technique.

 

data Have;
call streaminit(1234);
do Group = 1 to 10;
   StdDev = round(1 + rand("Uniform"), 0.05);  /* Group-specific Std Dev */
   do i = 1 to 50;
      x = rand("Normal", 0, StdDev);
      output;
   end;
end;
run;

proc iml;
use Have;
read all var {Group x};
close;
OverallVar = var(x);

u = unique(Group);
k = ncol(u);                /* number of groups */
variance = j(1, k, .);
do i = 1 to k;
   idx = loc( Group ^= u[i] );   /* omit the i_th group */
   variance[i] = var( x[idx] );  /* compute statistic on remaining k-1 groups */
end;
print variance;

est = ssq( OverallVar - variance ) / (k/(k-1));
print est;

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

I'm confused. Do you have a reference for what you are trying to do? Or can you provide data and explain what you are attempting? 

 

SURVEYFREQ is used to estimate proportions. For example, you can use PROC SURVEYFREQ to estimate that your population is 40% white, 40% black, 10% Hispanic, and 10% Asian, and to get standard errors for those estimates. Is that what you want?

 

Usually, "jackknife" refers to estimates that leave out one observation. However, you seem to imply that you want to leave out an entire level of a categorical variable such as dropping the "Hispanic" level for a RACE variable. I am not familiar with that method. There are cross-validation techniques that leave out a portion of the data, but that is a different method than the jackknife.

 

 

gsk
Obsidian | Level 7 gsk
Obsidian | Level 7

-Edited reply-

Data would consist of 50 observations from each group. Then, I would like to calculate a statistic of the observations within each group while excluding one observation from the group at a time. Then, in the end, I would be able to calculate a variance estimate for each group. 

 

Hm, I see.. Then I don't think I should use SURVEYFREQ. Sorry, I can't provide data but I think any data should work. I think I should use codes like https://blogs.sas.com/content/iml/2017/06/21/jackknife-estimate-standard-error-sas.html but instead, do leave one out within a group. 

Rick_SAS
SAS Super FREQ

I don't fully understand what you are saying, but I guess I don't have to. Your question seems to be "how can I compute k variances in SAS/IML where each one is computed by leaving out one of the k groups." 

 

Attached is a program that I hope will demonstrate the programming technique, even if I am misinterpreting some of the details. For background, I suggest you read my article about the "UNIQUE-LOC" technique.

 

data Have;
call streaminit(1234);
do Group = 1 to 10;
   StdDev = round(1 + rand("Uniform"), 0.05);  /* Group-specific Std Dev */
   do i = 1 to 50;
      x = rand("Normal", 0, StdDev);
      output;
   end;
end;
run;

proc iml;
use Have;
read all var {Group x};
close;
OverallVar = var(x);

u = unique(Group);
k = ncol(u);                /* number of groups */
variance = j(1, k, .);
do i = 1 to k;
   idx = loc( Group ^= u[i] );   /* omit the i_th group */
   variance[i] = var( x[idx] );  /* compute statistic on remaining k-1 groups */
end;
print variance;

est = ssq( OverallVar - variance ) / (k/(k-1));
print est;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 3 replies
  • 824 views
  • 1 like
  • 2 in conversation