BookmarkSubscribeRSS Feed
aalluru
Obsidian | Level 7

I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.

 

How can I do that?

4 REPLIES 4
Reeza
Super User
Have you tried PROC SURVEYSELECT?
aalluru
Obsidian | Level 7

I had a look at it but I'm not sure how I can use it to get what I need here

ballardw
Super User

@aalluru wrote:

I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.

 

How can I do that?


What other constraints might be involved? You don't mention how many records are involved, how many records should be in the resulting data set or if any other variables are involved or need to be considered.

 

Survey select with your data stratified by the variable should select a desired subset:

 

/* needed to use strata */
Proc sort data=have;
   by dependentvar;
run;

proc surveyselect data=have out=selected
   sampsize=(1234 1234); /* this is number of each that want, not a RATE*/
   strata dependentvar;
run;

Replace 1234 with the number of records of each that you want.

 

My feeling though is that but specifying your "outcome" variable this way you are very likely creating a bias that did not exist in the original data.

Consider if your outcome were to be a result like "had an adverse reaction to medication" and your independent variables are demographics where the original outcome was maybe 25% with reaction. You subset of data makes the overall "adverse rate" much higher and might obscure the common elements in the independent variables that were actually associated with the adverse reaction.

 

What specific types of analysis are planning for this data?

PGStats
Opal | Level 21

You are looking for a balanced dataset.

Start with:

 

proc sql;
select min(sum(depvar=0), sum(depvar=1)) into : sampSize;
quit;

proc sort data=myData; by depvar; run;
 
proc surveyselect data=myData out=mySamples method=srs sampSize=&sampSize.;
strata depvar;
run;

 (untested)

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 472 views
  • 0 likes
  • 4 in conversation