I have lab data for a study I am managing and when we draw blood sometime the blood draw is not complete so the participant has to return for a second visit and the blood is re-drawn. When this happens we have 2 records in out lab database, one for the first blood draw, and one for the second. how do I merge the two records together in SAS to get one complete record? I am providing an example of the the data below. Both of these records would be an example of data for the same participants. I would want to fill in the first record mostly for the variables "QFT1MLTBAG" and "QFT1MLMIT" because thos are the most important variables. I would also want to create new variables called "secdrawdate" that has the date from the "phlebdate" variable for for second visit(second record or "drawvisit=2"). After this is done I would like to delete the second record to only have one record for each participant.
Any help you can provide would be greatly appreciated because I have been trying to figure this out for a while. Thank you in advance!
datadate | dataentrd | fkey | globalrecordid | hxhcvtest | hxhivtest | hxtest | n10mledta | n5mlspp | phlebcomments | phlebdate | phlebredraw | phlebstorage | qftmitog | qftnil | qfttbag | recstatus | study_id | uniquekey | venue | visit | DRAWVISIT | QFT1MLNIL | QFT1MLTBAG | QFT1MLMIT | N5MLSSP | PHLEBSTORG | |
7/20/2012 12:00:00 AM | True | a30f6802-8098-4405-8a5e-09f366eeb6b8 | 1 | 0 | 0 | 6/14/2012 12:00:00 AM | 1 | 1 | 1234 | 50 | 1 | 1 | 1 | 2 | 0 | 0 | 99 | 0 | |||||||||
7/20/2012 12:00:00 AM | True | f783c269-89d0-48ce-8440-2b4df5b28969 | 0 | 0 | 0 | 6/22/2012 12:00:00 AM | 0 | 1 | 1234 | 56 | 1 | 1 | 2 | 2 | 2 | 2 | 1 |
In this case I suggest you MERGE the first participant record with selected columns from the last record, in a data step. I am making the following assumptions: that the unique id for each participant is the column study_id and the sequence is DRAWVISIT; the records are not necessarily in that order; that on occasions more than two records might be encountered (Bond: “Never Say Never”); and only the first and last records per participant are of interest.
So the first step is to sort the data into the order you want, the OUT= option will ensure that the original data order is undisturbed:
Proc Sort
Data = StudyData
Out = SortedData
;
By study_id
DRAWVISIT
;
Run ;
Now split the data into two sets, Initial and Final, containing the first and last records for each participant, respectively. In the case of one visit only, the records will go into Initial.
Data Initial
Final
;
Set SortedData ;
By study_id ;
If First.study_id
Then Output Initial ;
Else
If Last.study_id
Then Output Final ;
Run ;
For the Final dataset you need to have a KEEP list that will include study_id and all the variables you want to overwrite in the Initial dataset. I suggest you include DRAWVISIT because that will be an indicator that the record you save is a merged record (DRAWVISIT > 1). For example, using the information provided, your KEEP list would include QFT1MLTBAG and QFT1MLMIT.
Now merge Initial and Final and voila!
Data MergedData ;
Merge Initial
Final (Keep = study_id DRAWVISIT QFT1MLTBAG QFT1MLMIT /* more as required */)
;
By study_id ;
Run ;
This method will preserve the original column order.
The method can be extended if you want maximum, minimum or mean values from each draw visit; let me know.
In this case I suggest you MERGE the first participant record with selected columns from the last record, in a data step. I am making the following assumptions: that the unique id for each participant is the column study_id and the sequence is DRAWVISIT; the records are not necessarily in that order; that on occasions more than two records might be encountered (Bond: “Never Say Never”); and only the first and last records per participant are of interest.
So the first step is to sort the data into the order you want, the OUT= option will ensure that the original data order is undisturbed:
Proc Sort
Data = StudyData
Out = SortedData
;
By study_id
DRAWVISIT
;
Run ;
Now split the data into two sets, Initial and Final, containing the first and last records for each participant, respectively. In the case of one visit only, the records will go into Initial.
Data Initial
Final
;
Set SortedData ;
By study_id ;
If First.study_id
Then Output Initial ;
Else
If Last.study_id
Then Output Final ;
Run ;
For the Final dataset you need to have a KEEP list that will include study_id and all the variables you want to overwrite in the Initial dataset. I suggest you include DRAWVISIT because that will be an indicator that the record you save is a merged record (DRAWVISIT > 1). For example, using the information provided, your KEEP list would include QFT1MLTBAG and QFT1MLMIT.
Now merge Initial and Final and voila!
Data MergedData ;
Merge Initial
Final (Keep = study_id DRAWVISIT QFT1MLTBAG QFT1MLMIT /* more as required */)
;
By study_id ;
Run ;
This method will preserve the original column order.
The method can be extended if you want maximum, minimum or mean values from each draw visit; let me know.
Thank you! This did exactly what I needed it to do! I had played around with similar code but couldn't get it to work properly! I appreciate your help!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.