I'd like to look at the patients who did not take certain codes, in this example: 1,2,6 and create data WANT. So I will be able to assess the patients who did not take the codes.
data have;
input id code;
cards;
1 1
1 2
1 3
1 3
1 5
2 1
2 2
2 2
2 4
2 5
3 3
3 4
3 5
4 8
4 9
5 6
5 10
;
data want;
input id code;
cards;
3 3
3 4
3 5
4 8
4 9
;
I don't patients' info anymore as long as they were defined having 1,2 or 6. I tried the code below, but it doesn't exclude the patients who has codes of 1,2 and 6.
proc sql;
create table want as
select * from have
where id in
(select id from have where code not in (1,2,6));
quit;
Your logic is wrong.
ID=1 is kept since it has CODE=3.
This works:
proc sql;
create table WANT as
select * from HAVE
where ID not in
(select unique ID from HAVE where CODE in (1,2,6));
quit;
Boolean logic requires careful thinking. 🙂
both codes produced the same result with N=46,840,259 using my actual datasets.
proc sql;
create table want_Reeza as
select * from have
where id not in
(select id from have where code in (1,2,6));
quit;
proc sql;
create table WANT_cris as
select * from HAVE
where ID not in
(select unique ID from HAVE where CODE in (1,2,6));
quit;
I only return each ID once in the list. My reply is basically the same as @Reeza's, I didn't see there already was an answer.
I'm PUZZLED here. So I expect to end up with mutually exclusive datasets to defined vs not defined. And n of rows in final datasets suppose to add up to n of rows in the SOURCE_DATA. However, it doesn't and also there is no missing in CODE variable in the source data.
15,101+947,133 do NOT add up to 997,910
while
15,2014 + 608 do NOT add up to 17,068.
Here is something wrong.
Any hints, as to why? Any guess or pointers?
proc sql;
create table not_defined /*N=15,101*/ as
select * from SOURCE_DATA /*N=997,910*/
where display_id not in
(select display_id from med_dx where substr(code,1,3) in
('153','154','159','C18','C19','C78','D01','C20','C21','C7A','239',
'197','209','230','211','235','Z12','199','D12','D37','D49'));
quit;
proc sql;
create table defined /*N=947,133*/ as
select * from SOURCE_DATA /*N=997,910*/
where display_id in
(select display_id from med_dx_mmyy where substr(code,1,3) in ('153','154','159','C18','C19','C78','D01','C20','C21',
'C7A','239','197','209','230','211','235','Z12','199','D12','D37','D49'));
quit;
/*VALIDATING*/
proc sql;
select count(distinct display_id)
from SOURCE_DATA; /*N=17,068*/
quit;
proc sql;
select count(distinct display_id)
from defined; /*N=15,214 defined*/
quit;
proc sql;
select count(distinct display_id)
from not_defined; /*N=608 not defined*/
quit;
Are you querying SAS tables?
If not, the issue is probably null values, which satisfy neither condition.
Also note that a join would be much faster. Something like
proc sql;
create table UNDEFINED as
select *
from SOURCE_DATA source
left join
(select unique DISPLAY_ID
from med_dx_mmyy
where substr(code,1,3) in ('153','154','159','C18','C19','C78','D01','C20','C21',
'C7A','239','197','209','230','211','235','Z12','199','D12','D37','D49')) list
on source.DISPLAY_ID=list.DISPLAY_ID
where list.DISPLAY_ID is null;
quit;
This solved the problem
data; set;
code=substr(omm_dx_cd,1,3);
run;
proc sql;
create table undefined as
select * from source_data
where display_id not in
(select display_id from source_data where code in
('153','154','159','C18','C19','C78','D01','C20','C21','C7A','239', '197','209','230','211','235','Z12','199','D12','D37','D49'));
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.