BookmarkSubscribeRSS Feed
SAS_inquisitive
Lapis Lazuli | Level 10

Hello,

 

While using data step merge I get the warning " "NOTE: MERGE statement has more than one data set with repeats of BY values."  I used Proc Sql to get rid of this warning. This , however, gives all the comibinations of multiple by values. I want to keep only some observations from them. Is there a technique to get rid of undesired records without manually?

 

Thanks !

9 REPLIES 9
Reeza
Super User

You need to restrict it somehow using a where clause. 

KachiM
Rhodochrosite | Level 12

Many to Many Join can be done by Data Step much more conveniently than Proc SQL. Show your data sets through datalines.

Astounding
PROC Star

There's a way to do anything.  But you would have to show an example of the result you want ... an example of the many-to-many situation before the merge, and the matching example showing which observations you want after the merge.

SAS_inquisitive
Lapis Lazuli | Level 10

@Astounding@KachiM@Reeza

 

Here is the program and datasets.  The problem is I don't know how records in dat1 are releated to records in dat2.

data dat1;
  input patient_id $  code $ 10. code_startdate $ 12.;
  cards;
  A dermatitis 2015-10-05
  A dermatitis 2015-10-09
  ;
  run;

data dat2;
  input patient_id $  code $ 10. treatment_startdate $ 12.;
  cards;
  A dermatitis 2014-10-08
  A dermatitis 2015-10-07
  ;
  run;

 proc sql noprint;
    create table t1 as 
       select a.patient_id,
	          a.code,
			  a.code_startdate,
			  b.treatment_startdate
	  
		 from dat1 as a 
                 full join dat2 as b
		               on a.patient_id = b.patient_id and        
			              a.code       = b.code;
/*				where b.treatment_startdate > a.code_startdate;*/
quit;

 

 

Astounding
PROC Star

Now comes the key question.  For these sample data sets you provided, what would you like the result to look like after they are combined?

SAS_inquisitive
Lapis Lazuli | Level 10

@Astounding . The  final data should look this - earlier code_startdate relates to earlier tretment_startdate. 

 

Patient_id    code            code_startdate    treatment_startdate

A                  dermatitis     2015-05-05          2014-10-08       

A                  dermatitis      2015-05-09         2015-10-07

LinusH
Tourmaline | Level 20
Ok. To be able to direct you we also need to know if your real data have any deviations from this pattern.
Or does a code date always get followed by a treatment, and vice versa?
Data never sleeps
Astounding
PROC Star

Based on everything to date, I would suggest modifying the join.  Unfortunately, my SQL isn't strong enough to give you the exact syntax, but here's the idea.

 

group by patient_id code code_startdate

 

And embellish the WHERE clause:

 

where treatment_startdate < code_startdate and (code_startdate - treatment_startdate) = min(code_startdate - treatment_startdate)

 

I'm not sure if a full join is still appropriate in that case.

 

You also might need to experiment with a GROUP BY clause that uses treatment_startdate instead of code_startdate.

 

I can do the equivalent in a DATA step if that would be an acceptable approach. But it becomes cumbersome if there are additional variables involved in the "code" data set.

SAS_inquisitive
Lapis Lazuli | Level 10

@LinusH , @Astounding Thank you. I am trying to get more information about data pattern.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2723 views
  • 3 likes
  • 5 in conversation