- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am executing a query which has multiple columns in where clause which has multiple values. I know that in PROC SQL you can use IN condition to satisfy and get the correct output.
FAKE DATA
Question: CHF is defined using the following diagnosis codes:
398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)
1.Query: (I merge two tables)
proc sql;
create table BENE_CHF as
select distinct a.BENE_ID, b.ICD9_DGNS_CD_1, ICD9_DGNS_CD_2, ICD9_DGNS_CD_3, ICD9_DGNS_CD_4, ICD9_DGNS_CD_5,ICD9_DGNS_CD_6, ICD9_DGNS_CD_7, ICD9_DGNS_CD_8,
LINE_ICD9_DGNS_CD_1,
LINE_ICD9_DGNS_CD_2,
LINE_ICD9_DGNS_CD_3,
LINE_ICD9_DGNS_CD_4,
LINE_ICD9_DGNS_CD_5, THIS QUERY IS WORKING
LINE_ICD9_DGNS_CD_6,
LINE_ICD9_DGNS_CD_7,
LINE_ICD9_DGNS_CD_8,
LINE_ICD9_DGNS_CD_9,
LINE_ICD9_DGNS_CD_10,
LINE_ICD9_DGNS_CD_11,
LINE_ICD9_DGNS_CD_12,
LINE_ICD9_DGNS_CD_13
from work.beneficiary_summary_file_2008 as a,
work.carrier_2008 as b
where a.BENE_ID = b.BENE_ID;
quit;
/***********************************************************************************************************/
Next sept is to sort the data according to ICD codes methioned above.
I tried different quries such as;
1.proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1 and
ICD9_DGNS_CD_2 and
ICD9_DGNS_CD_3 and
ICD9_DGNS_CD_4 and
ICD9_DGNS_CD_5 and
ICD9_DGNS_CD_6 and
ICD9_DGNS_CD_7 and
ICD9_DGNS_CD_8 and
LINE_ICD9_DGNS_CD_1 and
LINE_ICD9_DGNS_CD_2 and
LINE_ICD9_DGNS_CD_3 and
LINE_ICD9_DGNS_CD_4 and
LINE_ICD9_DGNS_CD_5 and
LINE_ICD9_DGNS_CD_6 and
LINE_ICD9_DGNS_CD_7 and
LINE_ICD9_DGNS_CD_8 and
LINE_ICD9_DGNS_CD_9 and
LINE_ICD9_DGNS_CD_10 and
LINE_ICD9_DGNS_CD_11 and
LINE_ICD9_DGNS_CD_12 and
LINE_ICD9_DGNS_CD_13 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830');
quit;
2.proc sql;
select *
from bene_chf
where ICD9_DGNS_CD_1
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_2
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_3
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_4
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or
ICD9_DGNS_CD_5
in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
group by bene_id;
quit;
Not able to get a right output.
any suggestion or comments are highly appriciated.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Provide a small example data set in the form of data step code and expected outcome using that example data for
proc sql; select * from bene_chf where ICD9_DGNS_CD_1 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_2 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_3 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_4 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_5 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') group by bene_id; quit;
Likely you could also reduce the number of variables to 2 or 3 and the list of values to 2 or 3 as well as long as the data and code behave the same.
One might also ask why you say
CHF is defined using the following diagnosis codes: 398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)
and then go to reference ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') without the decimal. I'm not intimate with all of the ICD codes but I do believe it is possible to remove decimals and create duplicate results from different starts.
Suppose you have one code of 39.891 as well as 398.91. Removing decimals would yield 39891 for both.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Provide a small example data set in the form of data step code and expected outcome using that example data for
proc sql; select * from bene_chf where ICD9_DGNS_CD_1 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_2 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_3 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_4 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_5 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') group by bene_id; quit;
Likely you could also reduce the number of variables to 2 or 3 and the list of values to 2 or 3 as well as long as the data and code behave the same.
One might also ask why you say
CHF is defined using the following diagnosis codes: 398.91, 402.11, 402.91, 404.11, 404.13, 404.91, 404.93 (Codes are in CHARATER form)
and then go to reference ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') without the decimal. I'm not intimate with all of the ICD codes but I do believe it is possible to remove decimals and create duplicate results from different starts.
Suppose you have one code of 39.891 as well as 398.91. Removing decimals would yield 39891 for both.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Your first query has no values defined to multiple variables, then they will take all the values (acts as not missing).
Check this example:
data have;
var='A';var1='A';output;
var='B';var1=' ';output;
var=' ';var1=' ';output;
var=' ';var1='A';output;
;
run;
proc sql ;
select * from have
where var and var1; /* This will filter no missing values */
quit;
proc sql ;
select * from have
where not var and not var1;
quit;
Your second query works only if the values for those variables are character and there are no leading and trailing blanks only. Use STRIP() function to remove leading and trailing blanks.
where strip(ICD9_DGNS_CD_1) in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') or ICD9_DGNS_CD_2 in ('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830') .......so on
Use numeric values if they are stored as numeric values in your source.
where ICD9_DGNS_CD_1 in (39891, 40211, 40291, 40411, 40413, 40491, 40493, 42830) or ICD9_DGNS_CD_2 in (39891, 40211, 40291, 40411, 40413, 40491, 40493, 42830) ......so on
Suryakiran
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi SuryKiran,
Thanks for your suggestion. I you suggested I used 'STRIP()' still I am not getting the right output. Variables are the character.
I am look for an output which can give the selected values
('39891', '40211', '40291', '40411', '40413', '40491', '40493', '42830')
can you suggest me any other way.