Subet data based on date, gender, and age

Accepted Solution Solved
Reply
Regular Contributor
Posts: 150
Accepted Solution

Subet data based on date, gender, and age

I have a dataset with 200K observations from surveys that were collected between 2002 and 2014. The data is in wide format and there are 4 date variables, date1-4. I need to first subset to include surveys after January 01, 2010. I tired this code but it gives me zero observations:

data date_subset;

set allparts;

if surveydate_w1 < '01JAN2010'd then delete;

if surveydate_w2 < '01JAN2010'd then delete;

if surveydate_w3 < '01JAN2010'd then delete;

if surveydate_w4 < '01JAN2010'd then delete;

run;

After that I need to select a random sample that has a mean age of 35 and is 25% female and 75% male. I think I can use the proc surveyselect or proc sql for this but have never done it. Any help with these procedure would be appreciated.


Accepted Solutions
Solution
‎08-03-2015 03:34 PM
Super User
Posts: 11,124

Re: Subet data based on date, gender, and age

It appears that you have no records with ALL of the dates later than 01JAN2010.

Maybe what you want is:

if surveydate_w1 ge '01JAN2010'd OR surveydate_w2 ge '01JAN2010'd or surveydate_w3 ge '01JAN2010'd or surveydate_w4 ge '01JAN2010'd then OUTPUT;

Lots of luck trying to specify a MEAN age with a RANDOM selection.

You might be better off specifying an age range that includes 35 such as a dataset option (where = ( 25 le age le 45)).

With suveyselect you can specify STRATA level sample rates such as :

proc surveyselect data=have out=want

     samprate=(.25 .75); /* to use this the input data should be sorted by the STRATA variable and assumes that the value for male comes before female when sorted. Reverse the order of the values if that is no the case*/

Strata Gender;

run;

View solution in original post


All Replies
Super User
Posts: 5,386

Re: Subet data based on date, gender, and age

First, narrow data is usually better suited for analysis and reporting. And easier to transform. So a general advice would be to transpose your data set.

Then I'm not sure what your subset should do.  Now your logic keeps records where all dates are prior to 2010. Is that correct?

Are you sure that your dates are in SAS date format?

Data never sleeps
Solution
‎08-03-2015 03:34 PM
Super User
Posts: 11,124

Re: Subet data based on date, gender, and age

It appears that you have no records with ALL of the dates later than 01JAN2010.

Maybe what you want is:

if surveydate_w1 ge '01JAN2010'd OR surveydate_w2 ge '01JAN2010'd or surveydate_w3 ge '01JAN2010'd or surveydate_w4 ge '01JAN2010'd then OUTPUT;

Lots of luck trying to specify a MEAN age with a RANDOM selection.

You might be better off specifying an age range that includes 35 such as a dataset option (where = ( 25 le age le 45)).

With suveyselect you can specify STRATA level sample rates such as :

proc surveyselect data=have out=want

     samprate=(.25 .75); /* to use this the input data should be sorted by the STRATA variable and assumes that the value for male comes before female when sorted. Reverse the order of the values if that is no the case*/

Strata Gender;

run;

Regular Contributor
Posts: 150

Re: Subet data based on date, gender, and age

Thank you, you are right I was specifying incorrectly and don't have anyone with all of the dates that meet that criteria. The modification you suggested worked.

I am doing a comparison between two datasets with different outcomes so I will just use the same age range and see what other restrictions might be best. Thank you for your help!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 186 views
  • 0 likes
  • 3 in conversation