BookmarkSubscribeRSS Feed
SAS_inquisitive
Lapis Lazuli | Level 10

Hello,

 

While using data step merge I get the warning " "NOTE: MERGE statement has more than one data set with repeats of BY values."  I used Proc Sql to get rid of this warning. This , however, gives all the comibinations of multiple by values. I want to keep only some observations from them. Is there a technique to get rid of undesired records without manually?

 

Thanks !

9 REPLIES 9
Reeza
Super User

You need to restrict it somehow using a where clause. 

KachiM
Rhodochrosite | Level 12

Many to Many Join can be done by Data Step much more conveniently than Proc SQL. Show your data sets through datalines.

Astounding
PROC Star

There's a way to do anything.  But you would have to show an example of the result you want ... an example of the many-to-many situation before the merge, and the matching example showing which observations you want after the merge.

SAS_inquisitive
Lapis Lazuli | Level 10

@Astounding@KachiM@Reeza

 

Here is the program and datasets.  The problem is I don't know how records in dat1 are releated to records in dat2.

data dat1;
  input patient_id $  code $ 10. code_startdate $ 12.;
  cards;
  A dermatitis 2015-10-05
  A dermatitis 2015-10-09
  ;
  run;

data dat2;
  input patient_id $  code $ 10. treatment_startdate $ 12.;
  cards;
  A dermatitis 2014-10-08
  A dermatitis 2015-10-07
  ;
  run;

 proc sql noprint;
    create table t1 as 
       select a.patient_id,
	          a.code,
			  a.code_startdate,
			  b.treatment_startdate
	  
		 from dat1 as a 
                 full join dat2 as b
		               on a.patient_id = b.patient_id and        
			              a.code       = b.code;
/*				where b.treatment_startdate > a.code_startdate;*/
quit;

 

 

Astounding
PROC Star

Now comes the key question.  For these sample data sets you provided, what would you like the result to look like after they are combined?

SAS_inquisitive
Lapis Lazuli | Level 10

@Astounding . The  final data should look this - earlier code_startdate relates to earlier tretment_startdate. 

 

Patient_id    code            code_startdate    treatment_startdate

A                  dermatitis     2015-05-05          2014-10-08       

A                  dermatitis      2015-05-09         2015-10-07

LinusH
Tourmaline | Level 20
Ok. To be able to direct you we also need to know if your real data have any deviations from this pattern.
Or does a code date always get followed by a treatment, and vice versa?
Data never sleeps
Astounding
PROC Star

Based on everything to date, I would suggest modifying the join.  Unfortunately, my SQL isn't strong enough to give you the exact syntax, but here's the idea.

 

group by patient_id code code_startdate

 

And embellish the WHERE clause:

 

where treatment_startdate < code_startdate and (code_startdate - treatment_startdate) = min(code_startdate - treatment_startdate)

 

I'm not sure if a full join is still appropriate in that case.

 

You also might need to experiment with a GROUP BY clause that uses treatment_startdate instead of code_startdate.

 

I can do the equivalent in a DATA step if that would be an acceptable approach. But it becomes cumbersome if there are additional variables involved in the "code" data set.

SAS_inquisitive
Lapis Lazuli | Level 10

@LinusH , @Astounding Thank you. I am trying to get more information about data pattern.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2182 views
  • 3 likes
  • 5 in conversation