- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
suppose to have the following two datasets:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Code;
format Admission date9. Discharge date9.;
cards;
0001 17MAR2016 23MAR2016 486
0001 26MAR2016 04APR2016 4660
0001 03AUG2017 11AUG2017 49121
0001 03AUG2017 11AUG2017 485
0001 08FEB2018 08FEB2018 49122
0001 08FEB2018 10FEB2018 49122
;
run;
data DB1;
input ID :$20. Admission :date09. Discharge :date09. Code;
format Admission date9. Discharge date9.;
cards;
0001 17MAR2016 23MAR2016 486
0001 26MAR2016 04APR2016 4660
0001 03AUG2017 11AUG2017 49121
0001 03AUG2017 11AUG2017 485
0001 08FEB2018 10FEB2018 49122
;
run;
I'm trying to make DB identical to DB1. For example the 5th row in DB should be removed because not present in DB1. I'm trying the following but without success:
proc sql;
create table DBx as
select * from DB
where ID in (select ID from DB1) and
Admission in (select Admission from DB1) and
Discharge in (select Discharge from DB1) and
Code in (select Code from DB1)
;
quit;
Can anyone help me please?
The original datasets have many more rows and columns. Only some rows are shown for simplicity and about other columns/variables they don't make a rule for the subset except: ID, Admission, Discharge and Code that I showed.
Thank you in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sort data=db;
by id admission discharge code;
run;
proc sort data=db1;
by id admission discharge code;
run;
data dbx;
merge db(in=in1) db1(in=in2);
by id admission discharge code;
if in1 and in2;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think the explanation is complete. You say "I'm trying to make DB identical to DB1" but you have to tell us the logic to use to remove rows (and make any other changes that might be necessary) to make DB identical to DB1. We can't generalize from the very brief example you have shown us.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sort data=db;
by id admission discharge code;
run;
proc sort data=db1;
by id admission discharge code;
run;
data dbx;
merge db(in=in1) db1(in=in2);
by id admission discharge code;
if in1 and in2;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have to say that I am still not 100% sure I grasp what you are asking.
Here's what I think you want.
If a row is in DB and in DB1, keep it.
If a row is in DB and not in DB1, remove it.
If a row is not in DB and is in DB1 ... I don't know what you want.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So there are no rows "not in DB but in DB1". DB1 is a subset of DB. Thank you very much for your help!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You appear to be missing a dataset in your description. If you only have the two datasets why not just take the second one directly and ignore the first?
Is the issue that there is some third dataset that want to use to convert the first into the second?
If you want to compare the whole observation then you might just want to use PROC SQL's INTERSECT operation.
proc sql;
create table want as
select * from table1
intersect
select * from table2
;
quit;
If the issue is just to compare on only the key variables and not worry about other variables then perhaps you want to do data step merge. So if you want to keep all of the observations from DB1 whose key variable values do not appear in DB2 then the merge step might look like this:
data want;
merge db1(in=in1) db2(in=in2 keep=id admission discharge code);
by id admission discharge code;
if in1 and in2;
run;