BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mgrasmussen
Quartz | Level 8

Dear SAS experts

 

I am a relatively new user, but have used Stata for some years.

 

I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.

 

As an example:

 

data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;

 

I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:

 

proc sql noprint;
select count(*) into :N from example;
quit;

 

data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;

 

But the amount of observations in the dataset goes from 6 to 36 - why?

 

The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.

 

In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.

 

Does someone have a suggestion on how I can resolve the issues mentioned above?

 

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@mgrasmussen wrote:

Dear SAS experts

 

I am a relatively new user, but have used Stata for some years.

 

I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.

 

As an example:

 

data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;

 

I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:

 

proc sql noprint;
select count(*) into :N from example;
quit;

 

data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;

 

But the amount of observations in the dataset goes from 6 to 36 - why?

 


&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.

 

 

The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.

 

In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.

 

You don't need separate generation of random variables for Maria and John. 

 

data have1;
    set have;
    uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
    by name uni_ran;
run;
data want;
    set have1;
    by name uni_ran;
    if last.name;
run;

If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use

if last.name and not first.name;
--
Paige Miller

View solution in original post

2 REPLIES 2
PaigeMiller
Diamond | Level 26

@mgrasmussen wrote:

Dear SAS experts

 

I am a relatively new user, but have used Stata for some years.

 

I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.

 

As an example:

 

data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;

 

I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:

 

proc sql noprint;
select count(*) into :N from example;
quit;

 

data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;

 

But the amount of observations in the dataset goes from 6 to 36 - why?

 


&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.

 

 

The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.

 

In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.

 

You don't need separate generation of random variables for Maria and John. 

 

data have1;
    set have;
    uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
    by name uni_ran;
run;
data want;
    set have1;
    by name uni_ran;
    if last.name;
run;

If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use

if last.name and not first.name;
--
Paige Miller

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 789 views
  • 0 likes
  • 2 in conversation