BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mgrasmussen
Quartz | Level 8

Dear SAS experts

 

I am a relatively new user, but have used Stata for some years.

 

I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.

 

As an example:

 

data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;

 

I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:

 

proc sql noprint;
select count(*) into :N from example;
quit;

 

data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;

 

But the amount of observations in the dataset goes from 6 to 36 - why?

 

The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.

 

In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.

 

Does someone have a suggestion on how I can resolve the issues mentioned above?

 

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@mgrasmussen wrote:

Dear SAS experts

 

I am a relatively new user, but have used Stata for some years.

 

I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.

 

As an example:

 

data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;

 

I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:

 

proc sql noprint;
select count(*) into :N from example;
quit;

 

data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;

 

But the amount of observations in the dataset goes from 6 to 36 - why?

 


&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.

 

 

The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.

 

In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.

 

You don't need separate generation of random variables for Maria and John. 

 

data have1;
    set have;
    uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
    by name uni_ran;
run;
data want;
    set have1;
    by name uni_ran;
    if last.name;
run;

If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use

if last.name and not first.name;
--
Paige Miller

View solution in original post

2 REPLIES 2
PaigeMiller
Diamond | Level 26

@mgrasmussen wrote:

Dear SAS experts

 

I am a relatively new user, but have used Stata for some years.

 

I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.

 

As an example:

 

data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;

 

I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:

 

proc sql noprint;
select count(*) into :N from example;
quit;

 

data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;

 

But the amount of observations in the dataset goes from 6 to 36 - why?

 


&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.

 

 

The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.

 

In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.

 

You don't need separate generation of random variables for Maria and John. 

 

data have1;
    set have;
    uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
    by name uni_ran;
run;
data want;
    set have1;
    by name uni_ran;
    if last.name;
run;

If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use

if last.name and not first.name;
--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 851 views
  • 0 likes
  • 2 in conversation