DATA Step, Macro, Functions and more

Merging on Non-Identical Values

Accepted Solution Solved
Reply
Regular Contributor
Posts: 220
Accepted Solution

Merging on Non-Identical Values

I have two data sets: one with lab test results and one with lab normal values (displayed below are very small subsets for one patient).

Lab Test Results:

PATNOLAB_IDTSTTYPCDORDSEQNOTSTCDCOLL_DTRESULT
37280007-001Chemistry1Sodium7-May-12139
37280007-001Chemistry1Sodium21-May-12139
37280007-001Chemistry1Sodium1-Jun-12137
37280007-001Chemistry1Potassium7-May-123.5
37280007-001Chemistry1Potassium21-May-123.0
37280007-001Chemistry1Potassium1-Jun-123.3

Lab Normal Values:

LAB_IDTSTTYPCDORDSEQNOTSTCDLABNAMELNDATLOW_LIMUP_LIMUNITS
0007-001Chemistry1SodiumUniversity Medical Center18-Feb-09135145mMol/L
0007-001Chemistry1SodiumUAZCC-North15-May-12130150mMol/L
0007-001Chemistry2PotassiumUniversity Medical Center18-Feb-093.55.5mMol/L
0007-001Chemistry2PotassiumUAZCC-North15-May-123.85.9mMol/L

I am merging on LAB_ID, TSTTYPCD, ORDSEQNO, and TSTCD (in that order).  However, the new data set needs to matches up the lab results with the lab normal values based on dates (COLL_DT for lab test results and LNDAT for lab normal values).  The lab normal values line up with a lab test based on the last LNDAT before COLL_DT.  I cannot simply add dates into the merging groups because they are not identical values.  I hope this makes sense.

The new table should look like this:

PATNOLAB_IDTSTTYPCDORDSEQNOTSTCDCOLL_DTRESULTLABNAMELNDATLOW_LIMUP_LIMUNITS
37280007-001Chemistry1Sodium7-May-12139University Medical Center18-Feb-09135145mMol/L
37280007-001Chemistry1Sodium21-May-12139UAZCC-North15-May-12130150mMol/L
37280007-001Chemistry1Sodium1-Jun-12137UAZCC-North15-May-12130150mMol/L
37280007-001Chemistry2Potassium7-May-123.5University Medical Center18-Feb-093.55.5mMol/L
37280007-001Chemistry2Potassium21-May-123.0UAZCC-North15-May-123.85.9mMol/L
37280007-001Chemistry2Potassium1-Jun-123.3UAZCC-North15-May-123.85.9mMol/L

Any suggestions on how to approach this?


Accepted Solutions
Solution
‎10-24-2012 03:19 PM
PROC Star
Posts: 7,366

Re: Merging on Non-Identical Values

Someone more proficient with proc sql, than I am, will have to show you how this REALLY ought to be done.  However, I think that the following does what you want:

proc sql;

  CREATE TABLE first AS

    SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS

      FROM patients p inner join lab l

         on p.LAB_ID=l.LAB_ID and

               p.TSTTYPCD=l.TSTTYPCD and

               p.ORDSEQNO=l.ORDSEQNO and

               p.TSTCD=l.TSTCD and

            p.coll_dt gt l.lndat

  ;

  create table want as

    select *

      from first

        group by LAB_ID,TSTTYPCD,ORDSEQNO,TSTCD,COLL_DT

          having lndat=max(lndat)

  ;

quit;

View solution in original post


All Replies
PROC Star
Posts: 7,366

Re: Merging on Non-Identical Values

Since you are trying to accomplish a many-to-many merge, I would think that proc sql will be the easiest way.  What are your criteria for including records based on date (i.e., a range of how many days before and after)?

Regular Contributor
Posts: 220

Re: Merging on Non-Identical Values

There is no number-of-days requirement.  I am just using the last lab normal values assessment before the lab test results assessment.  I will play around with SQL to see what I can come up with.

Just a side note, I believe I sat by you at the MWSUG in Minneapolis during the Monday night dinner.  We didn't get to speak though.  I meant to see if you spoke Russian practice my русский on you.

PROC Star
Posts: 7,366

Re: Merging on Non-Identical Values

Yes we did but, no, I don't speak any Russian.  It was an excellent conference .. I presented four papers.

Back to your question, you might be able to get away with something as simple as:

proc sql;

  CREATE TABLE BOTH AS

    SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS

      FROM patients p, lab l

        WHERE  p.LAB_ID=l.LAB_ID and

               p.TSTTYPCD=l.TSTTYPCD and

               p.ORDSEQNO=l.ORDSEQNO and

               p.TSTCD=l.TSTCD

;

quit;

and, if you want to include a date range comparison, that is fairly simple and straight forward in sql.

Regular Contributor
Posts: 220

Re: Merging on Non-Identical Values

This almost gets me there.  It double/triple counts everything.  If there is only one LNDAT for a given LAB_ID, then it works great.  But if there are two LNDAT's, then it double counts; if there are three LNDAT's, then it triple counts, etc.  I tried to simplify by deleting all records where LNDAT<COLL_DT.

That helped quite a bit, but for the example that I gave, LNDAT of 18-Feb-09 and 15-May-12 are both less than COLL_DT of 21-May-12 or 1-Jun-12.  Therefore, it captured both sets of lab normal values and double counted.

Solution
‎10-24-2012 03:19 PM
PROC Star
Posts: 7,366

Re: Merging on Non-Identical Values

Someone more proficient with proc sql, than I am, will have to show you how this REALLY ought to be done.  However, I think that the following does what you want:

proc sql;

  CREATE TABLE first AS

    SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS

      FROM patients p inner join lab l

         on p.LAB_ID=l.LAB_ID and

               p.TSTTYPCD=l.TSTTYPCD and

               p.ORDSEQNO=l.ORDSEQNO and

               p.TSTCD=l.TSTCD and

            p.coll_dt gt l.lndat

  ;

  create table want as

    select *

      from first

        group by LAB_ID,TSTTYPCD,ORDSEQNO,TSTCD,COLL_DT

          having lndat=max(lndat)

  ;

quit;

Respected Advisor
Posts: 3,124

Re: Merging on Non-Identical Values

Art, I didn't test it, but I see no reason you can't combine those two steps together?

proc sql;

  CREATE TABLE first AS

    SELECT p.*, l.LABNAME, l.LNDAT, l.LOW_LIM,l.UP_LIM,l.UNITS

      FROM patients p inner join lab l

         on p.LAB_ID=l.LAB_ID and

               p.TSTTYPCD=l.TSTTYPCD and

               p.ORDSEQNO=l.ORDSEQNO and

               p.TSTCD=l.TSTCD and

            p.coll_dt gt l.lndat

        group by p.LAB_ID,p.TSTTYPCD,p.ORDSEQNO,p.TSTCD,p.COLL_DT

          having lndat=max(lndat)

  ;

quit;

Haikuo

PROC Star
Posts: 7,366

Re: Merging on Non-Identical Values

  Absolutely no reason and that is precisely why I suggested it.  Nice job!

  I changed your last 3 patient records to reflect an ORDSEQNO of 2 in order to achieve the results you were expecting.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 302 views
  • 3 likes
  • 3 in conversation