Hello,
While using data step merge I get the warning " "NOTE: MERGE statement has more than one data set with repeats of BY values." I used Proc Sql to get rid of this warning. This , however, gives all the comibinations of multiple by values. I want to keep only some observations from them. Is there a technique to get rid of undesired records without manually?
Thanks !
You need to restrict it somehow using a where clause.
Many to Many Join can be done by Data Step much more conveniently than Proc SQL. Show your data sets through datalines.
There's a way to do anything. But you would have to show an example of the result you want ... an example of the many-to-many situation before the merge, and the matching example showing which observations you want after the merge.
Here is the program and datasets. The problem is I don't know how records in dat1 are releated to records in dat2.
data dat1; input patient_id $ code $ 10. code_startdate $ 12.; cards; A dermatitis 2015-10-05 A dermatitis 2015-10-09 ; run; data dat2; input patient_id $ code $ 10. treatment_startdate $ 12.; cards; A dermatitis 2014-10-08 A dermatitis 2015-10-07 ; run; proc sql noprint; create table t1 as select a.patient_id, a.code, a.code_startdate, b.treatment_startdate from dat1 as a full join dat2 as b on a.patient_id = b.patient_id and a.code = b.code; /* where b.treatment_startdate > a.code_startdate;*/ quit;
Now comes the key question. For these sample data sets you provided, what would you like the result to look like after they are combined?
@Astounding . The final data should look this - earlier code_startdate relates to earlier tretment_startdate.
Patient_id code code_startdate treatment_startdate
A dermatitis 2015-05-05 2014-10-08
A dermatitis 2015-05-09 2015-10-07
Based on everything to date, I would suggest modifying the join. Unfortunately, my SQL isn't strong enough to give you the exact syntax, but here's the idea.
group by patient_id code code_startdate
And embellish the WHERE clause:
where treatment_startdate < code_startdate and (code_startdate - treatment_startdate) = min(code_startdate - treatment_startdate)
I'm not sure if a full join is still appropriate in that case.
You also might need to experiment with a GROUP BY clause that uses treatment_startdate instead of code_startdate.
I can do the equivalent in a DATA step if that would be an acceptable approach. But it becomes cumbersome if there are additional variables involved in the "code" data set.
@LinusH , @Astounding Thank you. I am trying to get more information about data pattern.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.