Merge duplicate records in SAS

Accepted Solution Solved
Reply
Regular Contributor
Posts: 150
Accepted Solution

Merge duplicate records in SAS

I have lab data for a study I am managing and when we draw blood sometime the blood draw is not complete so the participant has to return for a second visit and the blood is re-drawn. When this happens we have 2 records in out lab database, one for the first blood draw, and one for the second. how do I merge the two records together in SAS to get one complete record? I am providing an example of the the data below. Both of these records would be an example of data for the same participants. I would want to fill in the first record mostly for the variables "QFT1MLTBAG" and "QFT1MLMIT" because thos are the most important variables. I would also want to create new variables called "secdrawdate" that has the date from the "phlebdate" variable for for second visit(second record or "drawvisit=2"). After this is done I would like to delete the second record to only have one record for each participant.

Any help you can provide would be greatly appreciated because I have been trying to figure this out for a while. Thank you in advance!

datadate

dataentrd

fkey

globalrecordid

hxhcvtest

hxhivtest

hxtest

n10mledta

n5mlspp

phlebcomments

phlebdate

phlebredraw

phlebstorage

qftmitog

qftnil

qfttbag

recstatus

study_id

uniquekey

venue

visit

DRAWVISIT

QFT1MLNIL

QFT1MLTBAG

QFT1MLMIT

N5MLSSP

PHLEBSTORG

7/20/2012 12:00:00 AM

True

a30f6802-8098-4405-8a5e-09f366eeb6b8

1

0

0

6/14/2012 12:00:00 AM

1

1

1234

50

1

1

1

2

0

0

99

0

7/20/2012 12:00:00 AM

True

f783c269-89d0-48ce-8440-2b4df5b28969

0

0

0

6/22/2012 12:00:00 AM

0

1

1234

56

1

1

2

2

2

2

1


Accepted Solutions
Solution
‎10-15-2012 07:56 PM
Super Contributor
Posts: 644

Re: Merge duplicate records in SAS

In this case I suggest you MERGE the first participant record with selected columns from the last record, in a data step.  I am making the following assumptions: that the unique id for each participant is the column study_id and the sequence is DRAWVISIT; the records are not necessarily in that order; that on occasions more than two records might be encountered (Bond: “Never Say Never”); and only the first and last records per participant are of interest.

So the first step is to sort the data into the order you want, the OUT= option will ensure that the original data order is undisturbed:

Proc Sort

        Data    =   StudyData

        Out =   SortedData

        ;

    By      study_id

            DRAWVISIT

            ;

Run ;

Now split the data into two sets, Initial and Final, containing the first and last records for each participant, respectively.  In the case of one visit only, the records will go into Initial.

Data    Initial

        Final

        ;

    Set     SortedData ;

    By      study_id ;

    If      First.study_id

      Then  Output  Initial ;

      Else 

        If  Last.study_id

       Then  Output  Final ;

Run ;

For the Final dataset you need to have a KEEP list that will include study_id and all the variables you want to overwrite in the Initial dataset.  I suggest you include DRAWVISIT because that will be an indicator that the record you save is a merged record (DRAWVISIT > 1).  For example, using the information provided, your KEEP list would include QFT1MLTBAG and QFT1MLMIT.

Now merge Initial and Final and voila!

Data    MergedData ;

    Merge   Initial

            Final (Keep = study_id DRAWVISIT QFT1MLTBAG QFT1MLMIT /* more as required */)

            ;

    By      study_id ;

Run ;

This method will preserve the original column order.

The method can be extended if you want maximum, minimum or mean values from each draw visit;  let me know.

View solution in original post


All Replies
Solution
‎10-15-2012 07:56 PM
Super Contributor
Posts: 644

Re: Merge duplicate records in SAS

In this case I suggest you MERGE the first participant record with selected columns from the last record, in a data step.  I am making the following assumptions: that the unique id for each participant is the column study_id and the sequence is DRAWVISIT; the records are not necessarily in that order; that on occasions more than two records might be encountered (Bond: “Never Say Never”); and only the first and last records per participant are of interest.

So the first step is to sort the data into the order you want, the OUT= option will ensure that the original data order is undisturbed:

Proc Sort

        Data    =   StudyData

        Out =   SortedData

        ;

    By      study_id

            DRAWVISIT

            ;

Run ;

Now split the data into two sets, Initial and Final, containing the first and last records for each participant, respectively.  In the case of one visit only, the records will go into Initial.

Data    Initial

        Final

        ;

    Set     SortedData ;

    By      study_id ;

    If      First.study_id

      Then  Output  Initial ;

      Else 

        If  Last.study_id

       Then  Output  Final ;

Run ;

For the Final dataset you need to have a KEEP list that will include study_id and all the variables you want to overwrite in the Initial dataset.  I suggest you include DRAWVISIT because that will be an indicator that the record you save is a merged record (DRAWVISIT > 1).  For example, using the information provided, your KEEP list would include QFT1MLTBAG and QFT1MLMIT.

Now merge Initial and Final and voila!

Data    MergedData ;

    Merge   Initial

            Final (Keep = study_id DRAWVISIT QFT1MLTBAG QFT1MLMIT /* more as required */)

            ;

    By      study_id ;

Run ;

This method will preserve the original column order.

The method can be extended if you want maximum, minimum or mean values from each draw visit;  let me know.

Regular Contributor
Posts: 150

Re: Merge duplicate records in SAS

Thank you! This did exactly what I needed it to do! I had played around with similar code but couldn't get it to work properly! I appreciate your help!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 582 views
  • 1 like
  • 2 in conversation