Help using Base SAS procedures

"not in" for multiple variables

Reply
Occasional Contributor
Posts: 8

"not in" for multiple variables

I'm trying to find a way to delete observations if particular variable labels are not included in more than one variable. For example, I want to delete observations that do not have the icd-9 code for hip fracture (8208) in 9 different diagnosis code variables. In other words, if one observation doesn't have the code 8208 listed in the first variable (diagnosis_code_1) but does in the second variable (diagnosis_code_2) then that observation is deleted. My current code seems to delete the observation if the relevant codes are not listed in the first variable (diagnosis_code_1). What am I doing wrong?


data medicare_proc;
set medicare_final;
if diagnosis_code_1 not in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209)or
diagnosis_code_2 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_3 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_4 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_5 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_6 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_8 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_9 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
drg_code not in (209 236)
then delete;
run;
Super Contributor
Super Contributor
Posts: 3,174

Re: "not in" for multiple variables

Posted in reply to sweetpeaindeed
First, suggest you use a WHERE instead of IF, for possible performance gain. Also, you may consider using a SAS VIEW as well.

Then, review your use of OR vs AND after your first test. And so with what you have explained, you will need to surround the remaining tests with parentheses, so that the first test is for not-in-your-list *AND* is-in-your-list (for the remaining SAS variables).

And, consider using a SAS macro variable (using a %LET statement) to list the value-string that you want to test, presuming they are all the same variable values.

Scott Barry
SBBWorks, Inc. Message was edited by: sbb
Super Contributor
Super Contributor
Posts: 365

Re: "not in" for multiple variables

Posted in reply to sweetpeaindeed
Hello SweetPeaIndeed,

I tried your porgram on the simplified version of your data and it works as expected:
[pre]
data medicare_final;
input diagnosis_code_1 diagnosis_code_2 drg_code;
datalines;
82085 8208 209
8208 8208 209
8208 82005 209
run;
data medicare_proc;
set medicare_final;
if diagnosis_code_1 not in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
diagnosis_code_2 in (82000 82001 82002 82003 82009 82020 82021 82022 8208 8209) or
drg_code not in (209 236)
then delete;
run;
[/pre]
The output dataset contains only the last observation from medicare_final. First observation is excluded because it is not in code1 but is in code 2, second one is excluded because it is in code1. So it looks like the problem is not in your problem but may be in data?
Sincerely,
SPR
Ask a Question
Discussion stats
  • 2 replies
  • 146 views
  • 0 likes
  • 3 in conversation