I'd like to look at the patients who did not take certain codes, in this example: 1,2,6 and create data WANT. So I will be able to assess the patients who did not take the codes.
data have;
input id code;
cards;
1 1
1 2
1 3
1 3
1 5
2 1
2 2
2 2
2 4
2 5
3 3
3 4
3 5
4 8
4 9
5 6
5 10
;
data want;
input id code;
cards;
3 3
3 4
3 5
4 8
4 9
;
I don't patients' info anymore as long as they were defined having 1,2 or 6. I tried the code below, but it doesn't exclude the patients who has codes of 1,2 and 6.
proc sql;
create table want as
select * from have
where id in
(select id from have where code not in (1,2,6));
quit;
Your logic is wrong.
ID=1 is kept since it has CODE=3.
This works:
proc sql;
create table WANT as
select * from HAVE
where ID not in
(select unique ID from HAVE where CODE in (1,2,6));
quit;
Boolean logic requires careful thinking. 🙂
both codes produced the same result with N=46,840,259 using my actual datasets.
proc sql;
create table want_Reeza as
select * from have
where id not in
(select id from have where code in (1,2,6));
quit;
proc sql;
create table WANT_cris as
select * from HAVE
where ID not in
(select unique ID from HAVE where CODE in (1,2,6));
quit;
I only return each ID once in the list. My reply is basically the same as @Reeza's, I didn't see there already was an answer.
I'm PUZZLED here. So I expect to end up with mutually exclusive datasets to defined vs not defined. And n of rows in final datasets suppose to add up to n of rows in the SOURCE_DATA. However, it doesn't and also there is no missing in CODE variable in the source data.
15,101+947,133 do NOT add up to 997,910
while
15,2014 + 608 do NOT add up to 17,068.
Here is something wrong.
Any hints, as to why? Any guess or pointers?
proc sql;
create table not_defined /*N=15,101*/ as
select * from SOURCE_DATA /*N=997,910*/
where display_id not in
(select display_id from med_dx where substr(code,1,3) in
('153','154','159','C18','C19','C78','D01','C20','C21','C7A','239',
'197','209','230','211','235','Z12','199','D12','D37','D49'));
quit;
proc sql;
create table defined /*N=947,133*/ as
select * from SOURCE_DATA /*N=997,910*/
where display_id in
(select display_id from med_dx_mmyy where substr(code,1,3) in ('153','154','159','C18','C19','C78','D01','C20','C21',
'C7A','239','197','209','230','211','235','Z12','199','D12','D37','D49'));
quit;
/*VALIDATING*/
proc sql;
select count(distinct display_id)
from SOURCE_DATA; /*N=17,068*/
quit;
proc sql;
select count(distinct display_id)
from defined; /*N=15,214 defined*/
quit;
proc sql;
select count(distinct display_id)
from not_defined; /*N=608 not defined*/
quit;
Are you querying SAS tables?
If not, the issue is probably null values, which satisfy neither condition.
Also note that a join would be much faster. Something like
proc sql;
create table UNDEFINED as
select *
from SOURCE_DATA source
left join
(select unique DISPLAY_ID
from med_dx_mmyy
where substr(code,1,3) in ('153','154','159','C18','C19','C78','D01','C20','C21',
'C7A','239','197','209','230','211','235','Z12','199','D12','D37','D49')) list
on source.DISPLAY_ID=list.DISPLAY_ID
where list.DISPLAY_ID is null;
quit;
This solved the problem
data; set;
code=substr(omm_dx_cd,1,3);
run;
proc sql;
create table undefined as
select * from source_data
where display_id not in
(select display_id from source_data where code in
('153','154','159','C18','C19','C78','D01','C20','C21','C7A','239', '197','209','230','211','235','Z12','199','D12','D37','D49'));
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.