I have a dataset with 200K observations from surveys that were collected between 2002 and 2014. The data is in wide format and there are 4 date variables, date1-4. I need to first subset to include surveys after January 01, 2010. I tired this code but it gives me zero observations:
data date_subset;
set allparts;
if surveydate_w1 < '01JAN2010'd then delete;
if surveydate_w2 < '01JAN2010'd then delete;
if surveydate_w3 < '01JAN2010'd then delete;
if surveydate_w4 < '01JAN2010'd then delete;
run;
After that I need to select a random sample that has a mean age of 35 and is 25% female and 75% male. I think I can use the proc surveyselect or proc sql for this but have never done it. Any help with these procedure would be appreciated.
It appears that you have no records with ALL of the dates later than 01JAN2010.
Maybe what you want is:
if surveydate_w1 ge '01JAN2010'd OR surveydate_w2 ge '01JAN2010'd or surveydate_w3 ge '01JAN2010'd or surveydate_w4 ge '01JAN2010'd then OUTPUT;
Lots of luck trying to specify a MEAN age with a RANDOM selection.
You might be better off specifying an age range that includes 35 such as a dataset option (where = ( 25 le age le 45)).
With suveyselect you can specify STRATA level sample rates such as :
proc surveyselect data=have out=want
samprate=(.25 .75); /* to use this the input data should be sorted by the STRATA variable and assumes that the value for male comes before female when sorted. Reverse the order of the values if that is no the case*/
Strata Gender;
run;
First, narrow data is usually better suited for analysis and reporting. And easier to transform. So a general advice would be to transpose your data set.
Then I'm not sure what your subset should do. Now your logic keeps records where all dates are prior to 2010. Is that correct?
Are you sure that your dates are in SAS date format?
It appears that you have no records with ALL of the dates later than 01JAN2010.
Maybe what you want is:
if surveydate_w1 ge '01JAN2010'd OR surveydate_w2 ge '01JAN2010'd or surveydate_w3 ge '01JAN2010'd or surveydate_w4 ge '01JAN2010'd then OUTPUT;
Lots of luck trying to specify a MEAN age with a RANDOM selection.
You might be better off specifying an age range that includes 35 such as a dataset option (where = ( 25 le age le 45)).
With suveyselect you can specify STRATA level sample rates such as :
proc surveyselect data=have out=want
samprate=(.25 .75); /* to use this the input data should be sorted by the STRATA variable and assumes that the value for male comes before female when sorted. Reverse the order of the values if that is no the case*/
Strata Gender;
run;
Thank you, you are right I was specifying incorrectly and don't have anyone with all of the dates that meet that criteria. The modification you suggested worked.
I am doing a comparison between two datasets with different outcomes so I will just use the same age range and see what other restrictions might be best. Thank you for your help!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.