I have two data sets: one with lab test results and one with lab normal values (displayed below are very small subsets for one patient).
Lab Test Results:
PATNO | LAB_ID | TSTTYPCD | ORDSEQNO | TSTCD | COLL_DT | RESULT |
---|---|---|---|---|---|---|
3728 | 0007-001 | Chemistry | 1 | Sodium | 7-May-12 | 139 |
3728 | 0007-001 | Chemistry | 1 | Sodium | 21-May-12 | 139 |
3728 | 0007-001 | Chemistry | 1 | Sodium | 1-Jun-12 | 137 |
3728 | 0007-001 | Chemistry | 1 | Potassium | 7-May-12 | 3.5 |
3728 | 0007-001 | Chemistry | 1 | Potassium | 21-May-12 | 3.0 |
3728 | 0007-001 | Chemistry | 1 | Potassium | 1-Jun-12 | 3.3 |
Lab Normal Values:
LAB_ID | TSTTYPCD | ORDSEQNO | TSTCD | LABNAME | LNDAT | LOW_LIM | UP_LIM | UNITS |
---|---|---|---|---|---|---|---|---|
0007-001 | Chemistry | 1 | Sodium | University Medical Center | 18-Feb-09 | 135 | 145 | mMol/L |
0007-001 | Chemistry | 1 | Sodium | UAZCC-North | 15-May-12 | 130 | 150 | mMol/L |
0007-001 | Chemistry | 2 | Potassium | University Medical Center | 18-Feb-09 | 3.5 | 5.5 | mMol/L |
0007-001 | Chemistry | 2 | Potassium | UAZCC-North | 15-May-12 | 3.8 | 5.9 | mMol/L |
I am merging on LAB_ID, TSTTYPCD, ORDSEQNO, and TSTCD (in that order). However, the new data set needs to matches up the lab results with the lab normal values based on dates (COLL_DT for lab test results and LNDAT for lab normal values). The lab normal values line up with a lab test based on the last LNDAT before COLL_DT. I cannot simply add dates into the merging groups because they are not identical values. I hope this makes sense.
The new table should look like this:
PATNO | LAB_ID | TSTTYPCD | ORDSEQNO | TSTCD | COLL_DT | RESULT | LABNAME | LNDAT | LOW_LIM | UP_LIM | UNITS |
---|---|---|---|---|---|---|---|---|---|---|---|
3728 | 0007-001 | Chemistry | 1 | Sodium | 7-May-12 | 139 | University Medical Center | 18-Feb-09 | 135 | 145 | mMol/L |
3728 | 0007-001 | Chemistry | 1 | Sodium | 21-May-12 | 139 | UAZCC-North | 15-May-12 | 130 | 150 | mMol/L |
3728 | 0007-001 | Chemistry | 1 | Sodium | 1-Jun-12 | 137 | UAZCC-North | 15-May-12 | 130 | 150 | mMol/L |
3728 | 0007-001 | Chemistry | 2 | Potassium | 7-May-12 | 3.5 | University Medical Center | 18-Feb-09 | 3.5 | 5.5 | mMol/L |
3728 | 0007-001 | Chemistry | 2 | Potassium | 21-May-12 | 3.0 | UAZCC-North | 15-May-12 | 3.8 | 5.9 | mMol/L |
3728 | 0007-001 | Chemistry | 2 | Potassium | 1-Jun-12 | 3.3 | UAZCC-North | 15-May-12 | 3.8 | 5.9 | mMol/L |
Any suggestions on how to approach this?
Someone more proficient with proc sql, than I am, will have to show you how this REALLY ought to be done. However, I think that the following does what you want:
proc sql;
CREATE TABLE first AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p inner join lab l
on p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD and
p.coll_dt gt l.lndat
;
create table want as
select *
from first
group by LAB_ID,TSTTYPCD,ORDSEQNO,TSTCD,COLL_DT
having lndat=max(lndat)
;
quit;
Since you are trying to accomplish a many-to-many merge, I would think that proc sql will be the easiest way. What are your criteria for including records based on date (i.e., a range of how many days before and after)?
There is no number-of-days requirement. I am just using the last lab normal values assessment before the lab test results assessment. I will play around with SQL to see what I can come up with.
Just a side note, I believe I sat by you at the MWSUG in Minneapolis during the Monday night dinner. We didn't get to speak though. I meant to see if you spoke Russian practice my русский on you.
Yes we did but, no, I don't speak any Russian. It was an excellent conference .. I presented four papers.
Back to your question, you might be able to get away with something as simple as:
proc sql;
CREATE TABLE BOTH AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p, lab l
WHERE p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD
;
quit;
and, if you want to include a date range comparison, that is fairly simple and straight forward in sql.
This almost gets me there. It double/triple counts everything. If there is only one LNDAT for a given LAB_ID, then it works great. But if there are two LNDAT's, then it double counts; if there are three LNDAT's, then it triple counts, etc. I tried to simplify by deleting all records where LNDAT<COLL_DT.
That helped quite a bit, but for the example that I gave, LNDAT of 18-Feb-09 and 15-May-12 are both less than COLL_DT of 21-May-12 or 1-Jun-12. Therefore, it captured both sets of lab normal values and double counted.
Someone more proficient with proc sql, than I am, will have to show you how this REALLY ought to be done. However, I think that the following does what you want:
proc sql;
CREATE TABLE first AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p inner join lab l
on p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD and
p.coll_dt gt l.lndat
;
create table want as
select *
from first
group by LAB_ID,TSTTYPCD,ORDSEQNO,TSTCD,COLL_DT
having lndat=max(lndat)
;
quit;
Art, I didn't test it, but I see no reason you can't combine those two steps together?
proc sql;
CREATE TABLE first AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p inner join lab l
on p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD and
p.coll_dt gt l.lndat
group by p.LAB_ID,p.TSTTYPCD,p.ORDSEQNO,p.TSTCD,p.COLL_DT
having lndat=max(lndat)
;
quit;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.