07-24-2016 06:42 PM
While using data step merge I get the warning " "NOTE: MERGE statement has more than one data set with repeats of BY values." I used Proc Sql to get rid of this warning. This , however, gives all the comibinations of multiple by values. I want to keep only some observations from them. Is there a technique to get rid of undesired records without manually?
07-24-2016 08:42 PM
Many to Many Join can be done by Data Step much more conveniently than Proc SQL. Show your data sets through datalines.
07-24-2016 08:45 PM
There's a way to do anything. But you would have to show an example of the result you want ... an example of the many-to-many situation before the merge, and the matching example showing which observations you want after the merge.
07-25-2016 01:16 PM
Here is the program and datasets. The problem is I don't know how records in dat1 are releated to records in dat2.
data dat1; input patient_id $ code $ 10. code_startdate $ 12.; cards; A dermatitis 2015-10-05 A dermatitis 2015-10-09 ; run; data dat2; input patient_id $ code $ 10. treatment_startdate $ 12.; cards; A dermatitis 2014-10-08 A dermatitis 2015-10-07 ; run; proc sql noprint; create table t1 as select a.patient_id, a.code, a.code_startdate, b.treatment_startdate from dat1 as a full join dat2 as b on a.patient_id = b.patient_id and a.code = b.code; /* where b.treatment_startdate > a.code_startdate;*/ quit;
07-25-2016 01:24 PM
Now comes the key question. For these sample data sets you provided, what would you like the result to look like after they are combined?
07-25-2016 09:12 PM
@Astounding . The final data should look this - earlier code_startdate relates to earlier tretment_startdate.
Patient_id code code_startdate treatment_startdate
A dermatitis 2015-05-05 2014-10-08
A dermatitis 2015-05-09 2015-10-07
07-26-2016 01:55 AM
07-26-2016 07:31 AM
Based on everything to date, I would suggest modifying the join. Unfortunately, my SQL isn't strong enough to give you the exact syntax, but here's the idea.
group by patient_id code code_startdate
And embellish the WHERE clause:
where treatment_startdate < code_startdate and (code_startdate - treatment_startdate) = min(code_startdate - treatment_startdate)
I'm not sure if a full join is still appropriate in that case.
You also might need to experiment with a GROUP BY clause that uses treatment_startdate instead of code_startdate.
I can do the equivalent in a DATA step if that would be an acceptable approach. But it becomes cumbersome if there are additional variables involved in the "code" data set.