## Construct a jackknife sample based on groups (leave kth-group)

I have data with some categorical variables with missing values in them. I need variance estimates of the data based on leave i-th observation out within each group inside a categorical variable. (So in the end, I would have the number of estimates same as the number of observations in each group and k variance estimates for k groups in a categorical variable).

Is using ROC SURVEYFREQ with strata statement and VARMETHOD=JACKKNIFE specification a correct way to do this? Or do I need to make a loop like this: https://blogs.sas.com/content/iml/2017/06/21/jackknife-estimate-standard-error-sas.html

Also, if I need to make a loop like that for k groups, how do we construct proc iml codes?

Thank you in advance! I would greatly appreciate any help.

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Construct a jackknife sample based on groups (leave kth-group)

I don't fully understand what you are saying, but I guess I don't have to. Your question seems to be "how can I compute k variances in SAS/IML where each one is computed by leaving out one of the k groups."

Attached is a program that I hope will demonstrate the programming technique, even if I am misinterpreting some of the details. For background, I suggest you read my article about the "UNIQUE-LOC" technique.

``````data Have;
call streaminit(1234);
do Group = 1 to 10;
StdDev = round(1 + rand("Uniform"), 0.05);  /* Group-specific Std Dev */
do i = 1 to 50;
x = rand("Normal", 0, StdDev);
output;
end;
end;
run;

proc iml;
use Have;
close;
OverallVar = var(x);

u = unique(Group);
k = ncol(u);                /* number of groups */
variance = j(1, k, .);
do i = 1 to k;
idx = loc( Group ^= u[i] );   /* omit the i_th group */
variance[i] = var( x[idx] );  /* compute statistic on remaining k-1 groups */
end;
print variance;

est = ssq( OverallVar - variance ) / (k/(k-1));
print est;``````
3 REPLIES 3

## Re: Construct a jackknife sample based on groups (leave kth-group)

I'm confused. Do you have a reference for what you are trying to do? Or can you provide data and explain what you are attempting?

SURVEYFREQ is used to estimate proportions. For example, you can use PROC SURVEYFREQ to estimate that your population is 40% white, 40% black, 10% Hispanic, and 10% Asian, and to get standard errors for those estimates. Is that what you want?

Usually, "jackknife" refers to estimates that leave out one observation. However, you seem to imply that you want to leave out an entire level of a categorical variable such as dropping the "Hispanic" level for a RACE variable. I am not familiar with that method. There are cross-validation techniques that leave out a portion of the data, but that is a different method than the jackknife.

## Re: Construct a jackknife sample based on groups (leave kth-group)

Data would consist of 50 observations from each group. Then, I would like to calculate a statistic of the observations within each group while excluding one observation from the group at a time. Then, in the end, I would be able to calculate a variance estimate for each group.

Hm, I see.. Then I don't think I should use SURVEYFREQ. Sorry, I can't provide data but I think any data should work. I think I should use codes like https://blogs.sas.com/content/iml/2017/06/21/jackknife-estimate-standard-error-sas.html but instead, do leave one out within a group.

## Re: Construct a jackknife sample based on groups (leave kth-group)

I don't fully understand what you are saying, but I guess I don't have to. Your question seems to be "how can I compute k variances in SAS/IML where each one is computed by leaving out one of the k groups."

Attached is a program that I hope will demonstrate the programming technique, even if I am misinterpreting some of the details. For background, I suggest you read my article about the "UNIQUE-LOC" technique.

``````data Have;
call streaminit(1234);
do Group = 1 to 10;
StdDev = round(1 + rand("Uniform"), 0.05);  /* Group-specific Std Dev */
do i = 1 to 50;
x = rand("Normal", 0, StdDev);
output;
end;
end;
run;

proc iml;
use Have;
close;
OverallVar = var(x);

u = unique(Group);
k = ncol(u);                /* number of groups */
variance = j(1, k, .);
do i = 1 to k;
idx = loc( Group ^= u[i] );   /* omit the i_th group */
variance[i] = var( x[idx] );  /* compute statistic on remaining k-1 groups */
end;
print variance;

est = ssq( OverallVar - variance ) / (k/(k-1));
print est;``````
From The DO Loop