Hi guys,
suppose to have the following two datasets:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Code;
format Admission date9. Discharge date9.;
cards;
0001 17MAR2016 23MAR2016 486
0001 26MAR2016 04APR2016 4660
0001 03AUG2017 11AUG2017 49121
0001 03AUG2017 11AUG2017 485
0001 08FEB2018 08FEB2018 49122
0001 08FEB2018 10FEB2018 49122
;
run;
data DB1;
input ID :$20. Admission :date09. Discharge :date09. Code;
format Admission date9. Discharge date9.;
cards;
0001 17MAR2016 23MAR2016 486
0001 26MAR2016 04APR2016 4660
0001 03AUG2017 11AUG2017 49121
0001 03AUG2017 11AUG2017 485
0001 08FEB2018 10FEB2018 49122
;
run;
I'm trying to make DB identical to DB1. For example the 5th row in DB should be removed because not present in DB1. I'm trying the following but without success:
proc sql;
create table DBx as
select * from DB
where ID in (select ID from DB1) and
Admission in (select Admission from DB1) and
Discharge in (select Discharge from DB1) and
Code in (select Code from DB1)
;
quit;
Can anyone help me please?
The original datasets have many more rows and columns. Only some rows are shown for simplicity and about other columns/variables they don't make a rule for the subset except: ID, Admission, Discharge and Code that I showed.
Thank you in advance!
proc sort data=db;
by id admission discharge code;
run;
proc sort data=db1;
by id admission discharge code;
run;
data dbx;
merge db(in=in1) db1(in=in2);
by id admission discharge code;
if in1 and in2;
run;
I don't think the explanation is complete. You say "I'm trying to make DB identical to DB1" but you have to tell us the logic to use to remove rows (and make any other changes that might be necessary) to make DB identical to DB1. We can't generalize from the very brief example you have shown us.
proc sort data=db;
by id admission discharge code;
run;
proc sort data=db1;
by id admission discharge code;
run;
data dbx;
merge db(in=in1) db1(in=in2);
by id admission discharge code;
if in1 and in2;
run;
I have to say that I am still not 100% sure I grasp what you are asking.
Here's what I think you want.
If a row is in DB and in DB1, keep it.
If a row is in DB and not in DB1, remove it.
If a row is not in DB and is in DB1 ... I don't know what you want.
So there are no rows "not in DB but in DB1". DB1 is a subset of DB. Thank you very much for your help!!
You appear to be missing a dataset in your description. If you only have the two datasets why not just take the second one directly and ignore the first?
Is the issue that there is some third dataset that want to use to convert the first into the second?
If you want to compare the whole observation then you might just want to use PROC SQL's INTERSECT operation.
proc sql;
create table want as
select * from table1
intersect
select * from table2
;
quit;
If the issue is just to compare on only the key variables and not worry about other variables then perhaps you want to do data step merge. So if you want to keep all of the observations from DB1 whose key variable values do not appear in DB2 then the merge step might look like this:
data want;
merge db1(in=in1) db2(in=in2 keep=id admission discharge code);
by id admission discharge code;
if in1 and in2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.