Hi, I am a starter of SAS.
I want to do a form a sample based on a dataset contains a total of Y years observations, and I only want a random subset years to do some tasks.
Here is my Example:
data xxx;
input YEAR data;
datalines;
2001 123
2002 123
2001 123
2003 123
2001 123
2003 123
2004 123
2005 123
2001 123
2003 123
;
run;
For instance, it has 2001 to 2005, say if i want 2 random years data.
say in the first run; 2001 and 2002 year' observations will be selected in a new dataset, the second time is another 2001 and 2005. It is totaly random.
proc sql;
create table years as select distinct year from xxx;
create table rand as select year,rand('uniform') as r from years order by calculated r;
create table want as select year,data from xxx
where year in (select year from rand(obs=2));
quit;
Assign a random number to each row, then select the observation with the X lowest (or highest) random numbers.
data want;
set have;
r=rand('uniform');
run;
proc sort data=want;
by r;
run;
data want;
set want(obs=2); /* Or any other number of observations that you want */
run;
Alternative: use PROC SURVEYSELECT, see this example: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/statug/statug_surveyselect_gettingstarted01.htm
proc sql;
create table years as select distinct year from xxx;
create table rand as select year,rand('uniform') as r from years order by calculated r;
create table want as select year,data from xxx
where year in (select year from rand(obs=2));
quit;
Do the random selection on just the set of years.
First get a dataset with just the set of years
proc sort data=HAVE (keep=year) out=years nodupkey; run;
Then make a subset of the random years.
Then merge back with the original and only keep the observations that match the years selected.
Minor suggestion. When providing example data it is a good idea that other than a key variable, such as your year, has different values so you can see in the result what happens.
Then actually show some example(s) of desired result.
Another way:
data xxx; input YEAR data; /* to get different values of data for later examination*/ data= data+ _n_; datalines; 2001 123 2002 123 2001 123 2003 123 2001 123 2003 123 2004 123 2005 123 2001 123 2003 123 ; proc sql; create table years as select distinct year from xxx order by year; run; proc surveyselect data=years out=selectedyears noprint sampsize=2; run; proc sql; create table onerandomsample as select b.* from selectedyears as a inner join xxx as b on a.year=b.year ; quit;
If you want to keep the generated random sample you would change the name of "onerandomsample" each time you do this process.
You would change the number of sampled years by changing SAMPSIZE in the Proc surveyselect. There is some additional information that could be generated by Surveyselect such as sampling weights if needed. There is also an option SELECTALL for Surveyselect in case you ask for more years of data than exist in your base data that will result in all the data selected.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.