- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to create a new categorical variable and a new binary variable based off a series of binary variables that I have previously constructed, but the observation numbers are not adding up and I cannot find a solution.
An example of the code I am using looks like this:
Data newdata;
Set oldata;
If BV1 = 1 then CV = 1;
If BV2 = 1 then CV = 2;
If BV3 = 1 then CV = 3;
Run;
The number of observations where BV1 = 1 is 1165, the number of observations where BV2 = 1 is 69, and The number of observations where BV1 = 1 is 17. When I run the above code, the number of observations for each level of the categorical variable are less than the original numbers of the binary variables. I do not understand why these numbers would be changing. The construction of these binary variables were each based on their own variable, so there is no overlap of conditions for each of the binary variables I created.
I noticed the same issue if I tried to create a new binary variable indicating positivity for any of the original binary variables I constructed.
I have tried code using else if and or statements as follows:
Data newdata;
set oldata;
If BV1 = 1 then NBV = 1;
Else if BV2 = 1 then NBV = 1;
Else if BV3 = 1 then NBV = 1;
Run;
Data newdata;
set oldata;
If BV1 = 1 or if BV2 = 1 or BV3 = 1 then NBV = 1;
Run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc freq data=newdata;
table (BV1 BV2 BV3)*CV;
run;
Look for cases where the indicator is 1 but CV is not and check and I bet you BV3 =1 in those cases.
data errors;
set oldata;
where bv2=1 and bv3=1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc freq data=newdata;
table (BV1 BV2 BV3)*CV;
run;
Look for cases where the indicator is 1 but CV is not and check and I bet you BV3 =1 in those cases.
data errors;
set oldata;
where bv2=1 and bv3=1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
I hadn't originally checked for overlap between the variables because I did not think it was possible due to their definitions, but it appears I was incorrect.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For a similar diagnostic that may be a bit easier to follow the result:
proc freq data=newdata; table BV1* BV2* BV3*CV / list missing; run;
This would show rows with all the BV variables so should be easy to see where your likely overlap occurs.
Data newdata; set oldata; If BV1 = 1 or if BV2 = 1 or BV3 = 1 then NBV = 1; Run;
The ERRORS in the log should explain why the above doesn't work.
83 Data newdata; 84 set have; 85 If BV1 = 1 or if BV2 = 1 or BV3 = 1 then NBV = 1; --- 22 ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=. 86 Run;
The second IF is not what you want.
If whichn(1, of BV:)>0 then NBV=1;
Tests to see if any of the variables have the value of 1. The Whichn returns the number position in the list of the value that matches the first parameter. So if any of the values are 1 then the result is greater than 0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm hoping you would be able to share how you resolved this? I also have overlapping cases as I try to create a categorical variable from other binary variables that seem to be overwriting each other and causing errors in the frequencies I should be seeing.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content