BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
wudebao
Fluorite | Level 6

Hi, I am a starter of SAS.

I want to do a form a sample based on a dataset contains a total of Y years observations, and I only want a random subset years to do some tasks.

Here is my Example:

 

data xxx;
input YEAR data;
datalines;
2001 123
2002 123
2001 123
2003 123
2001 123
2003 123
2004 123
2005 123
2001 123
2003 123
;
run;

For instance, it has 2001 to 2005, say if i want 2 random years data. 

say in the first run; 2001 and 2002 year' observations will be selected in a new dataset, the second time is another 2001 and 2005. It is totaly random.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc sql;
	create table years as select distinct year from xxx;
	create table rand as select year,rand('uniform') as r from years order by calculated r;
	create table want as select year,data from xxx 
	    where year in (select year from rand(obs=2)); 
quit;
--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Assign a random number to each row, then select the observation with the X lowest (or highest) random numbers.

 

data want;
    set have;
    r=rand('uniform');
run;
proc sort data=want;
    by r;
run;
data want;
    set want(obs=2); /* Or any other number of observations that you want */
run;
    

 

Alternative: use PROC SURVEYSELECT, see this example: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/statug/statug_surveyselect_gettingstarted01.htm

 

 

--
Paige Miller
wudebao
Fluorite | Level 6
appreciate your support, but the thing is, I want to select random 2 years out of 5 years, within these random two years, all observations will be put togerther.
PaigeMiller
Diamond | Level 26
proc sql;
	create table years as select distinct year from xxx;
	create table rand as select year,rand('uniform') as r from years order by calculated r;
	create table want as select year,data from xxx 
	    where year in (select year from rand(obs=2)); 
quit;
--
Paige Miller
wudebao
Fluorite | Level 6
How can I aviod of selecting other 3 years then
Tom
Super User Tom
Super User

Do the random selection on just the set of years.

First get a dataset with just the set of years

proc sort data=HAVE (keep=year) out=years nodupkey; run;

Then make a subset of the random years.

Then merge back with the original and only keep the observations that match the years selected.

wudebao
Fluorite | Level 6
I considered to produce identical random integer number for observations in same years
then I can use
IF r<=2 to select 2 years observations,
But I failed, dont know how to produce identical random numbers.
ballardw
Super User

Minor suggestion. When providing example data it is a good idea that other than a key variable, such as your year, has different values so you can see in the result what happens.

 

Then actually show some example(s) of desired result.

 

Another way:

data xxx;
input YEAR data;
/* to get different values of data for later examination*/
data= data+ _n_;
datalines;
2001 123
2002 123
2001 123
2003 123
2001 123
2003 123
2004 123
2005 123
2001 123
2003 123
;

proc sql;
   create table years as
   select distinct year
   from xxx
   order by year;
run;

proc surveyselect data=years out=selectedyears noprint
   sampsize=2;
run;

proc sql;
   create table onerandomsample as
   select b.*
   from selectedyears as a
        inner join
        xxx as b
        on a.year=b.year
   ;
quit;

If you want to keep the generated random sample you would change the name of "onerandomsample" each time you do this process.

 

You would change the number of sampled years by changing SAMPSIZE in the Proc surveyselect. There is some additional information that could be generated by Surveyselect such as sampling weights if needed. There is also an option SELECTALL for Surveyselect in case you ask for more years of data than exist in your base data that will result in all the data selected.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1007 views
  • 2 likes
  • 4 in conversation