BookmarkSubscribeRSS Feed
jasbrn33
Fluorite | Level 6

Hi, I have a single dataset consisting of 10000 observations, 1000 per ID. I would like to run each ID from that dataset through a proc surveyselect function. I would like the first ID to run through the function then loop to the second ID to run through the function. How can I execute that? If possible.

15 REPLIES 15
Reeza
Super User
A single observation is a single row. If you have 1000 per ID, do you want it per ID or per row? Different specifications entirely.

Assuming per ID (because one per row doesn't make logical sense) then just add BY ID to the procedure so it's run uniquely per ID.
jasbrn33
Fluorite | Level 6

Oops, I meant per ID. Thank you. I have a series of other data steps and simulation this data will go through after the proc surveyselect function. I would like to take an ID to go through the proc surveyselect (this is the first step) then that dataset will be used for other datasets. Then I would like to repeat the process with my next ID. Adding a by column will give me one dataset in which I will run into the same problem where I want to execute per ID for the rest of the simulation.

Tom
Super User Tom
Super User

Not sure why you would want to do it separately for each level of ID.

Just add BY statements to your steps and get the results for ALL of IDs at once.

 

https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/183-2007.pdf

 

jasbrn33
Fluorite | Level 6

Okay I see. Currently, I have a proc sql step: 

 

proc sql no print;

create table test as

select *

from bootstrapped_test (from proc surveyselect)

where replicate in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);

quit;

 

How can I insert a BY for this step. If I have added "strata ID"

Tom
Super User Tom
Super User

Instead of converting numeric values, like percentiles, into text strings to store in macro variables just leave the values in the dataset and join the two datasets.

 

Perhaps something like:

create table test as
select a.*
from bootstrapped_test a
inner join statistics b
where a.id = b.id 
   and a.replicate = b.replicate 
   and b.percentile in (10,20,30,40,50,60,70,80,90)
;
jasbrn33
Fluorite | Level 6

Sorry, where is the "statistics" dataset from? 

Tom
Super User Tom
Super User

@jasbrn33 wrote:

Sorry, where is the "statistics" dataset from? 


You are the one doing the analysis.  
Where did the macro variables come from?  

 

If you generated percentiles using some procedure like PROC MEANS or PROC UNIVARIATE then just structure the resulting dataset in a way that you can use it join with the larger dataset to select the observations your previous logic was selecting via macro variable values.

 

Or perhaps you want to merge the percentile values onto the data?

Say you have generated a statistics dataset with one observation per ID value.  You could then merge on all of the variables, like a MEAN or a STD, to the source data with a simple data step merge.

data want;
   merge have statistics;
   by id;
run;
Reeza
Super User
Or use PROC RANK to add the percentile rank directly instead of PROC MEANS/UNIVARIATE.
ballardw
Super User

@jasbrn33 wrote:

Okay I see. Currently, I have a proc sql step: 

 

proc sql no print;

create table test as

select *

from bootstrapped_test (from proc surveyselect)

where replicate in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);

quit;

 

How can I insert a BY for this step. If I have added "strata ID"


You likely need to make separate percentile macro variables (why are there macro variables at all???) for each ID which be done in the step when you make those. Which step could very well use the By Strata since you have not shown how that is done.

Reeza
Super User

Even if he did, doing something like this doesn't work because the percentile values are not necessarily in your data. 
in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);

Reeza
Super User
A single observation is a single row. If you have 1000 per ID, do you want it per ID or per row? Different specifications entirely.

Assuming per ID (because one per row doesn't make logical sense) then just add BY ID to the procedure so it's run uniquely per ID.
ballardw
Super User

What do you mean by "loop"?

 

If you want to select different samples for each ID then you would use ID as a strata variable and you can specify different sample rates or sizes per strata if needed. Dummy example code to select 100 records for each ID. Data must be sorted by STRATA variable before that option is valid.

Proc sort data=have;
   by id;
run;

proc surveyselect data=have out=selected sampsize=100;
   strata id;
run;

Or to create different sizes of samples:

proc surveyselect data=have sampsize=(100 35 248 ...);
   strata id;
run;

Which would select 100 records for the first ID value (after sorting so understanding sort order is important), 35 for the second Id, 248 for the third Id . Replace ... with actual values for the remaining 7? ids. Or use SAMPRATE with the desired percentage of sampling.

 

Or provide a small example set of before and after describing what the "loop" did.

jasbrn33
Fluorite | Level 6

By "loop" I mean I would like to take an ID to go through the proc surveyselect then I would like to repeat the process with my next ID. Is it possible to output a dataset from each ID after going through a proc surveyselect? Ultimately, each ID dataset will go through a simulation. I would like to have each ID separately.

ballardw
Super User

@jasbrn33 wrote:

By "loop" I mean I would like to take an ID to go through the proc surveyselect then I would like to repeat the process with my next ID. Is it possible to output a dataset from each ID after going through a proc surveyselect? Ultimately, each ID dataset will go through a simulation. I would like to have each ID separately.


What you describe is generally accomplished with BY group processing. Each data step or procedure would use BY Id in some form so that the records are processed for each group of identical Id values. Splitting the data would typically only be done in the last step for a report, again using a BY group. Exceptions might be necessary if different Id values are treated drastically different but I would suggest starting with BY groups if at all possible. Provide examples of actual follow up processing.

 

SAS By group processing is built into almost everything and is the preferred method instead of "looping" with different data sets.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3979 views
  • 6 likes
  • 4 in conversation