How to loop a single observation from a dataset through proc surveysel...

jasbrn33 · Posted 05-31-2022 11:50 AM

Hi, I have a single dataset consisting of 10000 observations, 1000 per ID. I would like to run each ID from that dataset through a proc surveyselect function. I would like the first ID to run through the function then loop to the second ID to run through the function. How can I execute that? If possible.

Reeza · Posted 05-31-2022 12:00 PM

A single observation is a single row. If you have 1000 per ID, do you want it per ID or per row? Different specifications entirely.

Assuming per ID (because one per row doesn't make logical sense) then just add BY ID to the procedure so it's run uniquely per ID.

jasbrn33 · Posted 05-31-2022 12:09 PM

Oops, I meant per ID. Thank you. I have a series of other data steps and simulation this data will go through after the proc surveyselect function. I would like to take an ID to go through the proc surveyselect (this is the first step) then that dataset will be used for other datasets. Then I would like to repeat the process with my next ID. Adding a by column will give me one dataset in which I will run into the same problem where I want to execute per ID for the rest of the simulation.

Tom · Posted 05-31-2022 12:36 PM

Not sure why you would want to do it separately for each level of ID.

Just add BY statements to your steps and get the results for ALL of IDs at once.

https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/183-2007.pdf

jasbrn33 · Posted 05-31-2022 12:48 PM

Okay I see. Currently, I have a proc sql step:

proc sql no print;

create table test as

select *

from bootstrapped_test (from proc surveyselect)

where replicate in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);

quit;

How can I insert a BY for this step. If I have added "strata ID"

Tom · Posted 05-31-2022 01:13 PM

Instead of converting numeric values, like percentiles, into text strings to store in macro variables just leave the values in the dataset and join the two datasets.

Perhaps something like:

create table test as
select a.*
from bootstrapped_test a
inner join statistics b
where a.id = b.id 
   and a.replicate = b.replicate 
   and b.percentile in (10,20,30,40,50,60,70,80,90)
;

jasbrn33 · Posted 05-31-2022 01:22 PM

Sorry, where is the "statistics" dataset from?

Tom · Posted 05-31-2022 01:28 PM

@jasbrn33 wrote:

Sorry, where is the "statistics" dataset from?

You are the one doing the analysis.
Where did the macro variables come from?

If you generated percentiles using some procedure like PROC MEANS or PROC UNIVARIATE then just structure the resulting dataset in a way that you can use it join with the larger dataset to select the observations your previous logic was selecting via macro variable values.

Or perhaps you want to merge the percentile values onto the data?

Say you have generated a statistics dataset with one observation per ID value. You could then merge on all of the variables, like a MEAN or a STD, to the source data with a simple data step merge.

data want;
   merge have statistics;
   by id;
run;

Reeza · Posted 05-31-2022 01:34 PM

Or use PROC RANK to add the percentile rank directly instead of PROC MEANS/UNIVARIATE.

ballardw · Posted 05-31-2022 02:04 PM

@jasbrn33 wrote:

Okay I see. Currently, I have a proc sql step:

proc sql no print;

create table test as

select *

from bootstrapped_test (from proc surveyselect)

where replicate in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);

quit;

How can I insert a BY for this step. If I have added "strata ID"

You likely need to make separate percentile macro variables (why are there macro variables at all???) for each ID which be done in the step when you make those. Which step could very well use the By Strata since you have not shown how that is done.

Reeza · Posted 05-31-2022 03:02 PM

Even if he did, doing something like this doesn't work because the percentile values are not necessarily in your data.
in(&Percentile10., &Percentile20., &Percentile30., &Percentile40., &Percentile50., &Percentile60., &Percentile70., &Percentile80., &Percentile90.);

Reeza · Posted 05-31-2022 12:00 PM

A single observation is a single row. If you have 1000 per ID, do you want it per ID or per row? Different specifications entirely.

Assuming per ID (because one per row doesn't make logical sense) then just add BY ID to the procedure so it's run uniquely per ID.

ballardw · Posted 05-31-2022 12:05 PM

What do you mean by "loop"?

If you want to select different samples for each ID then you would use ID as a strata variable and you can specify different sample rates or sizes per strata if needed. Dummy example code to select 100 records for each ID. Data must be sorted by STRATA variable before that option is valid.

Proc sort data=have;
   by id;
run;

proc surveyselect data=have out=selected sampsize=100;
   strata id;
run;

Or to create different sizes of samples:

proc surveyselect data=have sampsize=(100 35 248 ...);
   strata id;
run;

Which would select 100 records for the first ID value (after sorting so understanding sort order is important), 35 for the second Id, 248 for the third Id . Replace ... with actual values for the remaining 7? ids. Or use SAMPRATE with the desired percentage of sampling.

Or provide a small example set of before and after describing what the "loop" did.

jasbrn33 · Posted 05-31-2022 12:30 PM

By "loop" I mean I would like to take an ID to go through the proc surveyselect then I would like to repeat the process with my next ID. Is it possible to output a dataset from each ID after going through a proc surveyselect? Ultimately, each ID dataset will go through a simulation. I would like to have each ID separately.

ballardw · Posted 05-31-2022 12:51 PM

@jasbrn33 wrote:

By "loop" I mean I would like to take an ID to go through the proc surveyselect then I would like to repeat the process with my next ID. Is it possible to output a dataset from each ID after going through a proc surveyselect? Ultimately, each ID dataset will go through a simulation. I would like to have each ID separately.

What you describe is generally accomplished with BY group processing. Each data step or procedure would use BY Id in some form so that the records are processed for each group of identical Id values. Splitting the data would typically only be done in the last step for a report, again using a BY group. Exceptions might be necessary if different Id values are treated drastically different but I would suggest starting with BY groups if at all possible. Provide examples of actual follow up processing.

SAS By group processing is built into almost everything and is the preferred method instead of "looping" with different data sets.

How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

Re: How to loop a single observation from a dataset through proc surveyselect?

SAS Innovate 2025: Call for Content

Classroom Training Available!