SAS Programming

prjadhav00 · Posted 01-15-2019 10:40 AM

I am executing a query which has multiple columns in where clause which has multiple values. I know that in PROC SQL you can use IN condition to satisfy and get the correct output.

FAKE DATA

Question: CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)

1.Query: (I merge two tables)

proc sql;
create table BENE_CHF as
select distinct a.BENE_ID, b.ICD9_DGNS_CD_1, ICD9_DGNS_CD_2, ICD9_DGNS_CD_3, ICD9_DGNS_CD_4, ICD9_DGNS_CD_5,ICD9_DGNS_CD_6, ICD9_DGNS_CD_7, ICD9_DGNS_CD_8,
LINE_ICD9_DGNS_CD_1,
LINE_ICD9_DGNS_CD_2,
LINE_ICD9_DGNS_CD_3,
LINE_ICD9_DGNS_CD_4,
LINE_ICD9_DGNS_CD_5, THIS QUERY IS WORKING
LINE_ICD9_DGNS_CD_6,
LINE_ICD9_DGNS_CD_7,
LINE_ICD9_DGNS_CD_8,
LINE_ICD9_DGNS_CD_9,
LINE_ICD9_DGNS_CD_10,
LINE_ICD9_DGNS_CD_11,
LINE_ICD9_DGNS_CD_12,
LINE_ICD9_DGNS_CD_13
from work.beneficiary_summary_file_2008 as a,
work.carrier_2008 as b
where a.BENE_ID = b.BENE_ID;
quit;

/***********************************************************************************************************/

Next sept is to sort the data according to ICD codes methioned above.

I tried different quries such as;

1.proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1 and
ICD9_DGNS_CD_2 and
ICD9_DGNS_CD_3 and
ICD9_DGNS_CD_4 and
ICD9_DGNS_CD_5 and
ICD9_DGNS_CD_6 and
ICD9_DGNS_CD_7 and
ICD9_DGNS_CD_8 and
LINE_ICD9_DGNS_CD_1 and
LINE_ICD9_DGNS_CD_2 and
LINE_ICD9_DGNS_CD_3 and
LINE_ICD9_DGNS_CD_4 and
LINE_ICD9_DGNS_CD_5 and
LINE_ICD9_DGNS_CD_6 and
LINE_ICD9_DGNS_CD_7 and
LINE_ICD9_DGNS_CD_8 and
LINE_ICD9_DGNS_CD_9 and
LINE_ICD9_DGNS_CD_10 and
LINE_ICD9_DGNS_CD_11 and
LINE_ICD9_DGNS_CD_12 and
LINE_ICD9_DGNS_CD_13 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830');
quit;

2.proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;

Not able to get a right output.

any suggestion or comments are highly appriciated.

ballardw · Posted 01-15-2019 11:16 AM

Provide a small example data set in the form of data step code and expected outcome using that example data for

proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;

Likely you could also reduce the number of variables to 2 or 3 and the list of values to 2 or 3 as well as long as the data and code behave the same.

One might also ask why you say

CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)

and then go to reference ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') without the decimal. I'm not intimate with all of the ICD codes but I do believe it is possible to remove decimals and create duplicate results from different starts.

Suppose you have one code of 39.891 as well as 398.91. Removing decimals would yield 39891 for both.

View solution in original post

ballardw · Posted 01-15-2019 11:16 AM

Provide a small example data set in the form of data step code and expected outcome using that example data for

proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;

Likely you could also reduce the number of variables to 2 or 3 and the list of values to 2 or 3 as well as long as the data and code behave the same.

One might also ask why you say

CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)

and then go to reference ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') without the decimal. I'm not intimate with all of the ICD codes but I do believe it is possible to remove decimals and create duplicate results from different starts.

Suppose you have one code of 39.891 as well as 398.91. Removing decimals would yield 39891 for both.

SuryaKiran · Posted 01-15-2019 11:31 AM

Hello,

Your first query has no values defined to multiple variables, then they will take all the values (acts as not missing).

Check this example:

data have;
var='A';var1='A';output;
var='B';var1=' ';output;
var=' ';var1=' ';output;
var=' ';var1='A';output;
;
run;

proc sql ;
select * from have
where var and var1; /* This will filter no missing values */
quit;

proc sql ;
select * from have
where not var and not var1;
quit;

Your second query works only if the values for those variables are character and there are no leading and trailing blanks only. Use STRIP() function to remove leading and trailing blanks.

where strip(ICD9_DGNS_CD_1)
 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or  
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') .......so on

Use numeric values if they are stored as numeric values in your source.

where ICD9_DGNS_CD_1
 in (39891, 40211, 40291, 40411, 40413, 40491, 40493, 42830) or  
ICD9_DGNS_CD_2
in (39891, 40211, 40291, 40411, 40413, 40491, 40493, 42830) ......so on

Thanks,
Suryakiran

prjadhav00 · Posted 01-15-2019 12:36 PM

Hi SuryKiran,

Thanks for your suggestion. I you suggested I used 'STRIP()' still I am not getting the right output. Variables are the character.

I am look for an output which can give the selected values

('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')

can you suggest me any other way.

SAS Programming

Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Follow Us

What is...

SAS Programming

Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Re: Where clause multiple column with multiple values

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...