Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint;
select count(*) into :N from example;
quit;
data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;
But the amount of observations in the dataset goes from 6 to 36 - why?
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
Does someone have a suggestion on how I can resolve the issues mentioned above?
Thank you
@mgrasmussen wrote:
Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint;
select count(*) into :N from example;
quit;
data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;
But the amount of observations in the dataset goes from 6 to 36 - why?
&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
You don't need separate generation of random variables for Maria and John.
data have1;
set have;
uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
by name uni_ran;
run;
data want;
set have1;
by name uni_ran;
if last.name;
run;
If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use
if last.name and not first.name;
@mgrasmussen wrote:
Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example;
input name $ value;
datalines;
Maria 2
Maria 3
Maria 6
John 2
John 6
Peter 2
;
run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint;
select count(*) into :N from example;
quit;
data example (drop=i);
set example;
do i=1 to &N;
uni_ran=rand("Uniform");
output;
end;
run;
But the amount of observations in the dataset goes from 6 to 36 - why?
&n = 6 and you have a loop that goes 1 to &n and each time outputs the record.
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
You don't need separate generation of random variables for Maria and John.
data have1;
set have;
uni_ran=rand("Uniform");
run;
/* Find the max random number for each name */
proc sort data=have1;
by name uni_ran;
run;
data want;
set have1;
by name uni_ran;
if last.name;
run;
If you want to exclude names with only a single record like Peter, then instead of the line if last.name; you can use
if last.name and not first.name;
Thank you. Appreciate it.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.