Hello All,
I need to alter the jackboot macro (which can be found here: https://support.sas.com/kb/24/982.html). I have a long dataset with 1-2 rows per participant (i.e., 167 participants providing 2 rows of data and 3 participants providing 1 row of data for a total of 337 rows).
Instead of dropping one row at a time as would be normal for jackknifing, I need to drop all 1-2 rows associated with each participant at a time (i.e., 167 datasets having 335 rows and 3 datasets having 336 rows). I believe only the %jackby or %jackslow macros (below) of the jackboot macro program need to be altered. Thoughts on how to do this?
%macro jackby( /* Jackknife resampling */
data=&_jackdat,
print=0
);
data JACKDATA/view=JACKDATA;
do _sample_=1 to &nobs;
do _i=1 to &nobs;
if _i^=_sample_ then do;
_obs_=_i;
set &data point=_i;
output;
end;
end;
end;
stop;
run;
%if &syserr>4 %then %goto exit;
%if &print %then %do;
proc print data=JACKDATA; id _sample_ _obs_; run;
%end;
%exit:;
%mend jackby;
%macro jackslow( /* Uniform jackknife sampling and analysis
without BY processing */
data=&_jackdat
);
%put %cmpres(WARNING: Jackknife analysis will be slow because the
ANALYZE macro did not use the BYSTMT macro.);
data JACKDIST; set JACKACT; _sample_=0; delete; run;
options nonotes;
%local sample;
%do sample=1 %to &nobs;
%put Jackknife sample &sample;
data _TMPD_;
drop _i;
do _i=1 to &nobs;
set &data;
if _i^=&sample then output;
end;
stop;
run;
%if &syserr>4 %then %goto exit;
%analyze(data=_TMPD_,out=_TMPS_);
%if &syserr>4 %then %goto exit;
data _TMPS_; set _TMPS_; _sample_=&sample; run;
%if &syserr>4 %then %goto exit;
proc append data=_TMPS_ base=JACKDIST; run;
%if &syserr>4 %then %goto exit;
%end;
%exit:;
options notes;
%mend jackslow;
It seems like the simpler approach would be to modify the data structure so that each participant is one row. By definition, a jackknife sample is a "leave one out" sample and there are exactly N jackknife samples for a data set that has N observations. All of the jackknife statistics are based on this fact. I don't think you can use the Jackknife formulas for a "leave on participant out" sample.
If you haven't already read it, the article "Jackknife estimates in SAS" provides an overview of the Jackknife method and various ways to create the jackknife samples.
I have a cursory understanding of Jackknife and Bootstrap. And I haven't used these macros. But I wonder, is it that you as the analyst give a dataset of distinct participant IDs as "data" to the Jackboot Macros, and then you as the analyst write the %ANALYZE macro function which reads the dataset of selected participant IDs chosen by Jackboot and you as the analyst also write into %ANALYZE all the logic that handles multiple rows and returns a dataset of measurements (I assume one row per ID) back to the macros?
In the simple case that original data is already one row per participant, you would just give the original data set. But in the context you describe, is it that you give the list of participants, Jackboot chooses which are for what purpose and which are in what resampled segmentation of the data. You deal with all that in the %ANALYZE macro? If so, definitely use indexed datasets collect and merge your selected data.
This is my educated guess, since no one else is offering... but then again... this question may not be in the best forum space. We might-should ask a moderator to move this to Statistical Procedures. @StatDave, @Rick_SAS ?
It seems like the simpler approach would be to modify the data structure so that each participant is one row. By definition, a jackknife sample is a "leave one out" sample and there are exactly N jackknife samples for a data set that has N observations. All of the jackknife statistics are based on this fact. I don't think you can use the Jackknife formulas for a "leave on participant out" sample.
If you haven't already read it, the article "Jackknife estimates in SAS" provides an overview of the Jackknife method and various ways to create the jackknife samples.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.