Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint;
select count(*) into :N from example;
quit;
data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;
But the amount of observations in the dataset goes from 6 to 36 - why?
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
Does someone have a suggestion on how I can resolve the issues mentioned above?
Thank you
@mgrasmussen wrote:
Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint;
select count(*) into :N from example;
quit;
data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;
But the amount of observations in the dataset goes from 6 to 36 - why?
&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
You don't need separate generation of random variables for Maria and John.
data have1;
set have;
uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
by name uni_ran;
run;
data want;
set have1;
by name uni_ran;
if last.name;
run;
If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use
if last.name and not first.name;
@mgrasmussen wrote:
Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint;
select count(*) into :N from example;
quit;
data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;
But the amount of observations in the dataset goes from 6 to 36 - why?
&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
You don't need separate generation of random variables for Maria and John.
data have1;
set have;
uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
by name uni_ran;
run;
data want;
set have1;
by name uni_ran;
if last.name;
run;
If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use
if last.name and not first.name;
Thank you. Appreciate it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.