Hi, I have a single dataset consisting of 10000 observations, 1000 per ID. I would like to run each ID from that dataset through a proc surveyselect function. I would like the first ID to run through the function then loop to the second ID to run through the function. How can I execute that? If possible.
Oops, I meant per ID. Thank you. I have a series of other data steps and simulation this data will go through after the proc surveyselect function. I would like to take an ID to go through the proc surveyselect (this is the first step) then that dataset will be used for other datasets. Then I would like to repeat the process with my next ID. Adding a by column will give me one dataset in which I will run into the same problem where I want to execute per ID for the rest of the simulation.
Not sure why you would want to do it separately for each level of ID.
Just add BY statements to your steps and get the results for ALL of IDs at once.
https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/183-2007.pdf
Okay I see. Currently, I have a proc sql step:
proc sql no print;
create table test as
select *
from bootstrapped_test (from proc surveyselect)
where replicate in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);
quit;
How can I insert a BY for this step. If I have added "strata ID"
Instead of converting numeric values, like percentiles, into text strings to store in macro variables just leave the values in the dataset and join the two datasets.
Perhaps something like:
create table test as
select a.*
from bootstrapped_test a
inner join statistics b
where a.id = b.id
and a.replicate = b.replicate
and b.percentile in (10,20,30,40,50,60,70,80,90)
;
Sorry, where is the "statistics" dataset from?
@jasbrn33 wrote:
Sorry, where is the "statistics" dataset from?
You are the one doing the analysis.
Where did the macro variables come from?
If you generated percentiles using some procedure like PROC MEANS or PROC UNIVARIATE then just structure the resulting dataset in a way that you can use it join with the larger dataset to select the observations your previous logic was selecting via macro variable values.
Or perhaps you want to merge the percentile values onto the data?
Say you have generated a statistics dataset with one observation per ID value. You could then merge on all of the variables, like a MEAN or a STD, to the source data with a simple data step merge.
data want;
merge have statistics;
by id;
run;
@jasbrn33 wrote:
Okay I see. Currently, I have a proc sql step:
proc sql no print;
create table test as
select *
from bootstrapped_test (from proc surveyselect)
where replicate in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);
quit;
How can I insert a BY for this step. If I have added "strata ID"
You likely need to make separate percentile macro variables (why are there macro variables at all???) for each ID which be done in the step when you make those. Which step could very well use the By Strata since you have not shown how that is done.
Even if he did, doing something like this doesn't work because the percentile values are not necessarily in your data.
in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);
What do you mean by "loop"?
If you want to select different samples for each ID then you would use ID as a strata variable and you can specify different sample rates or sizes per strata if needed. Dummy example code to select 100 records for each ID. Data must be sorted by STRATA variable before that option is valid.
Proc sort data=have; by id; run; proc surveyselect data=have out=selected sampsize=100; strata id; run;
Or to create different sizes of samples:
proc surveyselect data=have sampsize=(100 35 248 ...); strata id; run;
Which would select 100 records for the first ID value (after sorting so understanding sort order is important), 35 for the second Id, 248 for the third Id . Replace ... with actual values for the remaining 7? ids. Or use SAMPRATE with the desired percentage of sampling.
Or provide a small example set of before and after describing what the "loop" did.
By "loop" I mean I would like to take an ID to go through the proc surveyselect then I would like to repeat the process with my next ID. Is it possible to output a dataset from each ID after going through a proc surveyselect? Ultimately, each ID dataset will go through a simulation. I would like to have each ID separately.
@jasbrn33 wrote:
By "loop" I mean I would like to take an ID to go through the proc surveyselect then I would like to repeat the process with my next ID. Is it possible to output a dataset from each ID after going through a proc surveyselect? Ultimately, each ID dataset will go through a simulation. I would like to have each ID separately.
What you describe is generally accomplished with BY group processing. Each data step or procedure would use BY Id in some form so that the records are processed for each group of identical Id values. Splitting the data would typically only be done in the last step for a report, again using a BY group. Exceptions might be necessary if different Id values are treated drastically different but I would suggest starting with BY groups if at all possible. Provide examples of actual follow up processing.
SAS By group processing is built into almost everything and is the preferred method instead of "looping" with different data sets.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.