Hello,
While using data step merge I get the warning " "NOTE: MERGE statement has more than one data set with repeats of BY values." I used Proc Sql to get rid of this warning. This , however, gives all the comibinations of multiple by values. I want to keep only some observations from them. Is there a technique to get rid of undesired records without manually?
Thanks !
You need to restrict it somehow using a where clause.
Many to Many Join can be done by Data Step much more conveniently than Proc SQL. Show your data sets through datalines.
There's a way to do anything. But you would have to show an example of the result you want ... an example of the many-to-many situation before the merge, and the matching example showing which observations you want after the merge.
Here is the program and datasets. The problem is I don't know how records in dat1 are releated to records in dat2.
data dat1; input patient_id $ code $ 10. code_startdate $ 12.; cards; A dermatitis 2015-10-05 A dermatitis 2015-10-09 ; run; data dat2; input patient_id $ code $ 10. treatment_startdate $ 12.; cards; A dermatitis 2014-10-08 A dermatitis 2015-10-07 ; run; proc sql noprint; create table t1 as select a.patient_id, a.code, a.code_startdate, b.treatment_startdate from dat1 as a full join dat2 as b on a.patient_id = b.patient_id and a.code = b.code; /* where b.treatment_startdate > a.code_startdate;*/ quit;
Now comes the key question. For these sample data sets you provided, what would you like the result to look like after they are combined?
@Astounding . The final data should look this - earlier code_startdate relates to earlier tretment_startdate.
Patient_id code code_startdate treatment_startdate
A dermatitis 2015-05-05 2014-10-08
A dermatitis 2015-05-09 2015-10-07
Based on everything to date, I would suggest modifying the join. Unfortunately, my SQL isn't strong enough to give you the exact syntax, but here's the idea.
group by patient_id code code_startdate
And embellish the WHERE clause:
where treatment_startdate < code_startdate and (code_startdate - treatment_startdate) = min(code_startdate - treatment_startdate)
I'm not sure if a full join is still appropriate in that case.
You also might need to experiment with a GROUP BY clause that uses treatment_startdate instead of code_startdate.
I can do the equivalent in a DATA step if that would be an acceptable approach. But it becomes cumbersome if there are additional variables involved in the "code" data set.
@LinusH , @Astounding Thank you. I am trying to get more information about data pattern.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.