Dear SAS experts
I am a relatively new user, but have used Stata for some years.
I would like to be able to generate random numbers uniformily distributed. I want to do this seperately for certain subgroups in my dataset, although it might actually suffice to do it for the whole dataset.
As an example:
data example; input name $ value; datalines; Maria 2 Maria 3 Maria 6 John 2 John 6 Peter 2 ; run;
I would like to randomly select a 'Maria' observation and a 'John' observation (only 1 Peter, no selection needed) such that I am left with only one of the values of 'value'. To the first issue: I thought it would be easy to generate random data for the whole dataset:
proc sql noprint; select count(*) into :N from example; quit;
data example (drop=i); set example; do i=1 to &N; uni_ran=rand("Uniform"); output; end; run;
But the amount of observations in the dataset goes from 6 to 36 - why?
The second issue that I would like to be able to generate random numbers seperately for 'Maria' and 'John'. In general, how would one go about doing this? I have tried by including a -by- but I cannot get it to work. I imagine I have to somehow confine the do loop to the span of observations in each category, e.g. 3 for Maria and 2 for John, perhaps using some functions which identifies the amount of observations in a category of a certain variable. In the actual dataset I am using there are in many cases >3 observations within each category.
In general I have an interest in finding out how to perform different operations for each subgroup of a dataset without having to qualify the subgroup using fx -if- and -where', i.e. just "do this for each level of var". If there are 1000 levels of categorical variable this would require many lines of code using -if-.
Does someone have a suggestion on how I can resolve the issues mentioned above?
Thank you
... View more