- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to merge two datesets, both containing the same subjects and dates but different visit event. One record, lb_sv, has a subject, date, and an event for that day such as Visit 1, Visit 2, Unscheduled Visit, and so on. The dataset lb_mrl is missing the visit event. That's what I want to get out of the merge but when I merge lb_mrl with lb_sv then I get a more than one error. I'm not too familiar with those errors although I imagine it means that there are multiple records with the same subject and date in both datasets. Does anyone know of a simple way I could solve this issue?
What makes this problem interesting is that previously the in=a was on the dataset containing all of the desired variables (lb_sv). In the by statement of the merge, I had "by usubjid svstdtc unsflag etflag" and visits events populated for lb_mrl records. But when I switched in=a to the dataset lacking visit event (lb_mrl) the visit events did not merge. I just had a blanks in lb_sv_mrl for all of the records from lb_mrl.
%getMappedStructure(lb,rawdata.MRL,lbd); data LB_MRL; length visdatc $200. svstdtc $19.; set lbd; USUBJID='MSP-2017-1158-'||MRLID; if VISITLBL='Unscheduled Visit' then do; unsflag='Y'; end; if VISITLBL='Early Termination Visit' then do; etflag='Y'; end; visdatc=put(visitdt,mmddyy10.); vistimc=put(visittm,tod5.); %dtc(visdatc,,svstdtc); drop visit visitnum; run; data lb_sv; set sv; if VISIT=:'Uns' then do; unsflag='Y'; end; if VISIT=:'Early' then do; etflag='Y'; end; drop epoch; run; proc sort data=lb_mrl;by USUBJID svstdtc;run; proc sort data=lb_sv;by USUBJID svstdtc;run; data lb_sv_mrl; merge lb_mrl(in=a) lb_sv; by USUBJID svstdtc; if a; run; proc freq data=lb_sv_mrl noprint; tables USUBJID*svstdtc / out=chk(where=(count>1)); run; proc print data=chk; var USUBJID svstdtc; run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use the IN= dataset option to create flags to indicate which dataset the current observation came from. If you only want the data from the NO_VISIT table then you should use the KEEP= option on the other dataset so that you only pull in the variables you need. Otherwise any extra variables that dataset has will be added to the resulting dataset, but will be all missing if you only keep the records from the NO_VISIT datasets. If you do have other variables that you actually want to merge on then perhaps you can do that in another step after you have first figured out what visit id to use for these no_visit observations.
data want;
set with_visit(keep=id date visit) no_visit(in=in2);
by id date;
if first.id then prev_visit=visit;
if missing(visit) then visit=prev_visit;
else prev_visit=visit;
if in2;
run;
Note that the issue of how to assign visit identifiers to spontaneously reported data is a complex one and so in reality it might need more thought than this simple approach. For example in a clinical trial you might have visits scheduled for every 4 weeks. What visit do assign to an event that is 25 days after the second visit, 2 or 3? Now what about when visit 3 happened ahead of schedule and was at 24 days instead of 28 days? What about measurements that are taken early in the visit before medication is administered? Is that reading to analysed based on the dosing from visit 2 or the new dosing started after it was taken?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the goal is to try to figure out how to assign a visit id to the records without that field populated then perhaps you want to interleave the observations instead of merging them.
data want;
set with_visit (in=in1) no_visit(in=in2);
by id date ;
if first.id the prev_visit = visit;
retain prev_visit;
if missing(visit) then visit=prev_visit;
else prev_visit=visit;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use the IN= dataset option to create flags to indicate which dataset the current observation came from. If you only want the data from the NO_VISIT table then you should use the KEEP= option on the other dataset so that you only pull in the variables you need. Otherwise any extra variables that dataset has will be added to the resulting dataset, but will be all missing if you only keep the records from the NO_VISIT datasets. If you do have other variables that you actually want to merge on then perhaps you can do that in another step after you have first figured out what visit id to use for these no_visit observations.
data want;
set with_visit(keep=id date visit) no_visit(in=in2);
by id date;
if first.id then prev_visit=visit;
if missing(visit) then visit=prev_visit;
else prev_visit=visit;
if in2;
run;
Note that the issue of how to assign visit identifiers to spontaneously reported data is a complex one and so in reality it might need more thought than this simple approach. For example in a clinical trial you might have visits scheduled for every 4 weeks. What visit do assign to an event that is 25 days after the second visit, 2 or 3? Now what about when visit 3 happened ahead of schedule and was at 24 days instead of 28 days? What about measurements that are taken early in the visit before medication is administered? Is that reading to analysed based on the dosing from visit 2 or the new dosing started after it was taken?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I added "if in2;" I think it worked...
data getvisit;
set lb_sv (in=in1) lb_mrl(in=in2);
by usubjid svstdtc;
if first.usubjid then prev_visit=visit;
retain prev_visit;
if missing(visit)then visit=prev_visit;
else prev_visit=visit;
if in2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content