I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.
How can I do that?
I had a look at it but I'm not sure how I can use it to get what I need here
@aalluru wrote:
I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.
How can I do that?
What other constraints might be involved? You don't mention how many records are involved, how many records should be in the resulting data set or if any other variables are involved or need to be considered.
Survey select with your data stratified by the variable should select a desired subset:
/* needed to use strata */ Proc sort data=have; by dependentvar; run; proc surveyselect data=have out=selected sampsize=(1234 1234); /* this is number of each that want, not a RATE*/ strata dependentvar; run;
Replace 1234 with the number of records of each that you want.
My feeling though is that but specifying your "outcome" variable this way you are very likely creating a bias that did not exist in the original data.
Consider if your outcome were to be a result like "had an adverse reaction to medication" and your independent variables are demographics where the original outcome was maybe 25% with reaction. You subset of data makes the overall "adverse rate" much higher and might obscure the common elements in the independent variables that were actually associated with the adverse reaction.
What specific types of analysis are planning for this data?
You are looking for a balanced dataset.
Start with:
proc sql;
select min(sum(depvar=0), sum(depvar=1)) into : sampSize;
quit;
proc sort data=myData; by depvar; run;
proc surveyselect data=myData out=mySamples method=srs sampSize=&sampSize.;
strata depvar;
run;
(untested)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.