SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
prjadhav00
Calcite | Level 5

I am executing a query which has multiple columns in where clause which has multiple values. I know that in PROC SQL you can use IN condition to satisfy and get the correct output.

 

FAKE DATA

Question: CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)

 

1.Query: (I merge two tables)

proc sql;
create table BENE_CHF as
select distinct a.BENE_ID, b.ICD9_DGNS_CD_1, ICD9_DGNS_CD_2, ICD9_DGNS_CD_3, ICD9_DGNS_CD_4, ICD9_DGNS_CD_5,ICD9_DGNS_CD_6, ICD9_DGNS_CD_7, ICD9_DGNS_CD_8,
LINE_ICD9_DGNS_CD_1,
LINE_ICD9_DGNS_CD_2,
LINE_ICD9_DGNS_CD_3,
LINE_ICD9_DGNS_CD_4,
LINE_ICD9_DGNS_CD_5,                                              THIS QUERY IS WORKING
LINE_ICD9_DGNS_CD_6,
LINE_ICD9_DGNS_CD_7,
LINE_ICD9_DGNS_CD_8,
LINE_ICD9_DGNS_CD_9,
LINE_ICD9_DGNS_CD_10,
LINE_ICD9_DGNS_CD_11,
LINE_ICD9_DGNS_CD_12,
LINE_ICD9_DGNS_CD_13
from work.beneficiary_summary_file_2008 as a,
work.carrier_2008 as b
where a.BENE_ID = b.BENE_ID;
quit;

 

/***********************************************************************************************************/

Next sept is to sort the data according to ICD codes methioned above.

I tried different quries such as;

1.proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1 and
ICD9_DGNS_CD_2 and
ICD9_DGNS_CD_3 and
ICD9_DGNS_CD_4 and
ICD9_DGNS_CD_5 and
ICD9_DGNS_CD_6 and
ICD9_DGNS_CD_7 and
ICD9_DGNS_CD_8 and
LINE_ICD9_DGNS_CD_1 and
LINE_ICD9_DGNS_CD_2 and
LINE_ICD9_DGNS_CD_3 and
LINE_ICD9_DGNS_CD_4 and
LINE_ICD9_DGNS_CD_5 and
LINE_ICD9_DGNS_CD_6 and
LINE_ICD9_DGNS_CD_7 and
LINE_ICD9_DGNS_CD_8 and
LINE_ICD9_DGNS_CD_9 and
LINE_ICD9_DGNS_CD_10 and
LINE_ICD9_DGNS_CD_11 and
LINE_ICD9_DGNS_CD_12 and
LINE_ICD9_DGNS_CD_13 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830');
quit;

 

2.proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;

 

Not able to get a right output.

any suggestion or comments are highly appriciated.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Provide a small example data set in the form of data step code and expected outcome using that example data for

proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;

 Likely you could also reduce the number of variables to 2 or 3 and the list of values to 2 or 3 as well as long as the data and code behave the same.

 

 

One might also ask why you say

CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)

and then go to reference  ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') without the decimal. I'm not intimate with all of the ICD codes but I do believe it is possible to remove decimals and create duplicate results from different starts.

 

Suppose you have one code of 39.891 as well as 398.91. Removing decimals would yield 39891 for both.

View solution in original post

3 REPLIES 3
ballardw
Super User

Provide a small example data set in the form of data step code and expected outcome using that example data for

proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;

 Likely you could also reduce the number of variables to 2 or 3 and the list of values to 2 or 3 as well as long as the data and code behave the same.

 

 

One might also ask why you say

CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)

and then go to reference  ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') without the decimal. I'm not intimate with all of the ICD codes but I do believe it is possible to remove decimals and create duplicate results from different starts.

 

Suppose you have one code of 39.891 as well as 398.91. Removing decimals would yield 39891 for both.

SuryaKiran
Meteorite | Level 14

Hello,

 

Your first query has no values defined to multiple variables, then they will take all the values (acts as not missing).

Check this example:

data have;
var='A';var1='A';output;
var='B';var1=' ';output;
var=' ';var1=' ';output;
var=' ';var1='A';output;
;
run;

proc sql ;
select * from have
where var and var1; /* This will filter no missing values */
quit;

proc sql ;
select * from have
where not var and not var1;
quit;

 

Your second query works only if the values for those variables are character and there are no leading and trailing blanks only. Use STRIP() function to remove leading and trailing blanks. 

where strip(ICD9_DGNS_CD_1)
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') .......so on

 Use numeric values if they are stored as numeric values in your source.

where ICD9_DGNS_CD_1
 in (39891, 40211, 40291, 40411, 40413, 40491, 40493, 42830) or  
ICD9_DGNS_CD_2
in (39891, 40211, 40291, 40411, 40413, 40491, 40493, 42830) ......so on

 

Thanks,
Suryakiran
prjadhav00
Calcite | Level 5

Hi SuryKiran,

 

Thanks for your suggestion. I you suggested I used 'STRIP()'  still I am not getting the right output. Variables are the character.

 

I am look for an output which can give the selected values  

('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')

can you suggest me any other way.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 18120 views
  • 0 likes
  • 3 in conversation