I have two data sets: one with lab test results and one with lab normal values (displayed below are very small subsets for one patient).
Lab Test Results:
PATNO | LAB_ID | TSTTYPCD | ORDSEQNO | TSTCD | COLL_DT | RESULT |
---|---|---|---|---|---|---|
3728 | 0007-001 | Chemistry | 1 | Sodium | 7-May-12 | 139 |
3728 | 0007-001 | Chemistry | 1 | Sodium | 21-May-12 | 139 |
3728 | 0007-001 | Chemistry | 1 | Sodium | 1-Jun-12 | 137 |
3728 | 0007-001 | Chemistry | 1 | Potassium | 7-May-12 | 3.5 |
3728 | 0007-001 | Chemistry | 1 | Potassium | 21-May-12 | 3.0 |
3728 | 0007-001 | Chemistry | 1 | Potassium | 1-Jun-12 | 3.3 |
Lab Normal Values:
LAB_ID | TSTTYPCD | ORDSEQNO | TSTCD | LABNAME | LNDAT | LOW_LIM | UP_LIM | UNITS |
---|---|---|---|---|---|---|---|---|
0007-001 | Chemistry | 1 | Sodium | University Medical Center | 18-Feb-09 | 135 | 145 | mMol/L |
0007-001 | Chemistry | 1 | Sodium | UAZCC-North | 15-May-12 | 130 | 150 | mMol/L |
0007-001 | Chemistry | 2 | Potassium | University Medical Center | 18-Feb-09 | 3.5 | 5.5 | mMol/L |
0007-001 | Chemistry | 2 | Potassium | UAZCC-North | 15-May-12 | 3.8 | 5.9 | mMol/L |
I am merging on LAB_ID, TSTTYPCD, ORDSEQNO, and TSTCD (in that order). However, the new data set needs to matches up the lab results with the lab normal values based on dates (COLL_DT for lab test results and LNDAT for lab normal values). The lab normal values line up with a lab test based on the last LNDAT before COLL_DT. I cannot simply add dates into the merging groups because they are not identical values. I hope this makes sense.
The new table should look like this:
PATNO | LAB_ID | TSTTYPCD | ORDSEQNO | TSTCD | COLL_DT | RESULT | LABNAME | LNDAT | LOW_LIM | UP_LIM | UNITS |
---|---|---|---|---|---|---|---|---|---|---|---|
3728 | 0007-001 | Chemistry | 1 | Sodium | 7-May-12 | 139 | University Medical Center | 18-Feb-09 | 135 | 145 | mMol/L |
3728 | 0007-001 | Chemistry | 1 | Sodium | 21-May-12 | 139 | UAZCC-North | 15-May-12 | 130 | 150 | mMol/L |
3728 | 0007-001 | Chemistry | 1 | Sodium | 1-Jun-12 | 137 | UAZCC-North | 15-May-12 | 130 | 150 | mMol/L |
3728 | 0007-001 | Chemistry | 2 | Potassium | 7-May-12 | 3.5 | University Medical Center | 18-Feb-09 | 3.5 | 5.5 | mMol/L |
3728 | 0007-001 | Chemistry | 2 | Potassium | 21-May-12 | 3.0 | UAZCC-North | 15-May-12 | 3.8 | 5.9 | mMol/L |
3728 | 0007-001 | Chemistry | 2 | Potassium | 1-Jun-12 | 3.3 | UAZCC-North | 15-May-12 | 3.8 | 5.9 | mMol/L |
Any suggestions on how to approach this?
Someone more proficient with proc sql, than I am, will have to show you how this REALLY ought to be done. However, I think that the following does what you want:
proc sql;
CREATE TABLE first AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p inner join lab l
on p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD and
p.coll_dt gt l.lndat
;
create table want as
select *
from first
group by LAB_ID,TSTTYPCD,ORDSEQNO,TSTCD,COLL_DT
having lndat=max(lndat)
;
quit;
Since you are trying to accomplish a many-to-many merge, I would think that proc sql will be the easiest way. What are your criteria for including records based on date (i.e., a range of how many days before and after)?
There is no number-of-days requirement. I am just using the last lab normal values assessment before the lab test results assessment. I will play around with SQL to see what I can come up with.
Just a side note, I believe I sat by you at the MWSUG in Minneapolis during the Monday night dinner. We didn't get to speak though. I meant to see if you spoke Russian practice my русский on you.
Yes we did but, no, I don't speak any Russian. It was an excellent conference .. I presented four papers.
Back to your question, you might be able to get away with something as simple as:
proc sql;
CREATE TABLE BOTH AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p, lab l
WHERE p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD
;
quit;
and, if you want to include a date range comparison, that is fairly simple and straight forward in sql.
This almost gets me there. It double/triple counts everything. If there is only one LNDAT for a given LAB_ID, then it works great. But if there are two LNDAT's, then it double counts; if there are three LNDAT's, then it triple counts, etc. I tried to simplify by deleting all records where LNDAT<COLL_DT.
That helped quite a bit, but for the example that I gave, LNDAT of 18-Feb-09 and 15-May-12 are both less than COLL_DT of 21-May-12 or 1-Jun-12. Therefore, it captured both sets of lab normal values and double counted.
Someone more proficient with proc sql, than I am, will have to show you how this REALLY ought to be done. However, I think that the following does what you want:
proc sql;
CREATE TABLE first AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p inner join lab l
on p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD and
p.coll_dt gt l.lndat
;
create table want as
select *
from first
group by LAB_ID,TSTTYPCD,ORDSEQNO,TSTCD,COLL_DT
having lndat=max(lndat)
;
quit;
Art, I didn't test it, but I see no reason you can't combine those two steps together?
proc sql;
CREATE TABLE first AS
SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS
FROM patients p inner join lab l
on p.LAB_ID=l.LAB_ID and
p.TSTTYPCD=l.TSTTYPCD and
p.ORDSEQNO=l.ORDSEQNO and
p.TSTCD=l.TSTCD and
p.coll_dt gt l.lndat
group by p.LAB_ID,p.TSTTYPCD,p.ORDSEQNO,p.TSTCD,p.COLL_DT
having lndat=max(lndat)
;
quit;
Haikuo
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.