BookmarkSubscribeRSS Feed
aalluru
Obsidian | Level 7

I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.

 

How can I do that?

4 REPLIES 4
Reeza
Super User
Have you tried PROC SURVEYSELECT?
aalluru
Obsidian | Level 7

I had a look at it but I'm not sure how I can use it to get what I need here

ballardw
Super User

@aalluru wrote:

I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.

 

How can I do that?


What other constraints might be involved? You don't mention how many records are involved, how many records should be in the resulting data set or if any other variables are involved or need to be considered.

 

Survey select with your data stratified by the variable should select a desired subset:

 

/* needed to use strata */
Proc sort data=have;
   by dependentvar;
run;

proc surveyselect data=have out=selected
   sampsize=(1234 1234); /* this is number of each that want, not a RATE*/
   strata dependentvar;
run;

Replace 1234 with the number of records of each that you want.

 

My feeling though is that but specifying your "outcome" variable this way you are very likely creating a bias that did not exist in the original data.

Consider if your outcome were to be a result like "had an adverse reaction to medication" and your independent variables are demographics where the original outcome was maybe 25% with reaction. You subset of data makes the overall "adverse rate" much higher and might obscure the common elements in the independent variables that were actually associated with the adverse reaction.

 

What specific types of analysis are planning for this data?

PGStats
Opal | Level 21

You are looking for a balanced dataset.

Start with:

 

proc sql;
select min(sum(depvar=0), sum(depvar=1)) into : sampSize;
quit;

proc sort data=myData; by depvar; run;
 
proc surveyselect data=myData out=mySamples method=srs sampSize=&sampSize.;
strata depvar;
run;

 (untested)

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1201 views
  • 0 likes
  • 4 in conversation