Re: getting desired result in many-to-many join Proc Sql

SAS_inquisitive · Posted 07-24-2016 06:42 PM

Hello,

While using data step merge I get the warning " "NOTE: MERGE statement has more than one data set with repeats of BY values." I used Proc Sql to get rid of this warning. This , however, gives all the comibinations of multiple by values. I want to keep only some observations from them. Is there a technique to get rid of undesired records without manually?

Thanks !

Reeza · Posted 07-24-2016 07:04 PM

You need to restrict it somehow using a where clause.

KachiM · Posted 07-24-2016 08:42 PM

Many to Many Join can be done by Data Step much more conveniently than Proc SQL. Show your data sets through datalines.

Astounding · Posted 07-24-2016 08:45 PM

There's a way to do anything. But you would have to show an example of the result you want ... an example of the many-to-many situation before the merge, and the matching example showing which observations you want after the merge.

SAS_inquisitive · Posted 07-25-2016 01:16 PM

@Astounding, @KachiM, @Reeza

Here is the program and datasets. The problem is I don't know how records in dat1 are releated to records in dat2.

data dat1;
  input patient_id $  code $ 10. code_startdate $ 12.;
  cards;
  A dermatitis 2015-10-05
  A dermatitis 2015-10-09
  ;
  run;

data dat2;
  input patient_id $  code $ 10. treatment_startdate $ 12.;
  cards;
  A dermatitis 2014-10-08
  A dermatitis 2015-10-07
  ;
  run;

 proc sql noprint;
    create table t1 as 
       select a.patient_id,
	          a.code,
			  a.code_startdate,
			  b.treatment_startdate
	  
		 from dat1 as a 
                 full join dat2 as b
		               on a.patient_id = b.patient_id and        
			              a.code       = b.code;
/*				where b.treatment_startdate > a.code_startdate;*/
quit;

Astounding · Posted 07-25-2016 01:24 PM

Now comes the key question. For these sample data sets you provided, what would you like the result to look like after they are combined?

SAS_inquisitive · Posted 07-25-2016 09:12 PM

@Astounding . The final data should look this - earlier code_startdate relates to earlier tretment_startdate.

Patient_id code code_startdate treatment_startdate

A dermatitis 2015-05-05 2014-10-08

A dermatitis 2015-05-09 2015-10-07

LinusH · Posted 07-26-2016 01:55 AM

Ok. To be able to direct you we also need to know if your real data have any deviations from this pattern.
Or does a code date always get followed by a treatment, and vice versa?

Data never sleeps

Astounding · Posted 07-26-2016 07:31 AM

Based on everything to date, I would suggest modifying the join. Unfortunately, my SQL isn't strong enough to give you the exact syntax, but here's the idea.

group by patient_id code code_startdate

And embellish the WHERE clause:

where treatment_startdate < code_startdate and (code_startdate - treatment_startdate) = min(code_startdate - treatment_startdate)

I'm not sure if a full join is still appropriate in that case.

You also might need to experiment with a GROUP BY clause that uses treatment_startdate instead of code_startdate.

I can do the equivalent in a DATA step if that would be an acceptable approach. But it becomes cumbersome if there are additional variables involved in the "code" data set.

SAS_inquisitive · Posted 07-28-2016 06:35 PM

@LinusH , @Astounding Thank you. I am trying to get more information about data pattern.

Classroom Training Available!