BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi....I wonder if what I am trying to do is possible using merge. I would like to merge Tables 1 & 2. The StartDate is only in Table2 and the TermStartDate and TermEndDate are only in Table1. When I run this code I get an error message. Thanks...

 

data Table3;
     merge Table1(in=T1) Table2(in=T2);
     by ID Year Course;
     where T1.TermStartDate le T2.StartDate le T1.TermEndDate;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Are you attempting to do a many-to-many join?  That is: Is it possible for both TABLE1 and TABLE2 to have multiple observations for the same combination of ID YEAR and COURSE values?  If so then you do NOT want to use a data step merge, use an SQL join instead.

 

You cannot have periods in variable names.

IN= option sets the name of the VARIABLE that indicates it that dataset contributed to the observation.

You cannot use variables in a WHERE statement that are not on ALL of the input datasets.

 

Use a subsetting IF instead.

data Table3;
     merge Table1(in=T1) Table2(in=T2);
     by ID Year Course;
     if t1 and t2;
     if TermStartDate le StartDate le TermEndDate;
run;

 

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

From now on, when you get an error message, please show us the ENTIRE log for this DATA step or PROC.

 

where T1.TermStartDate le T2.StartDate le T1.TermEndDate;

 

This is not valid data step syntax. There is no syntax in a data step like T1.variablename, variable names cannot have any additional text like T1. pre-pended to it.

 

I think what you want is this:

 

where TermStartDate le StartDate le TermEndDate;

 

 

--
Paige Miller
Reeza
Super User

This is better accomplished via SQL not a data step. 

 

 

PaigeMiller
Diamond | Level 26

@Reeza wrote:

This is better accomplished via SQL not a data step. 


I guess I'm not in agreement. It seems to me that either a DATA step or SQL will get the job done, I see no reason to prefer one over the other. Can you explain why you think this is better in SQL?

--
Paige Miller
Reeza
Super User
Data step will merge all the data together and then filter out the records not needed via an IF statement. The WHERE statement will not work.

SQL only writes the required data to a dataset in the first place from my understanding.

This probably only becomes important as your data sets get larger.
Tom
Super User Tom
Super User

Are you attempting to do a many-to-many join?  That is: Is it possible for both TABLE1 and TABLE2 to have multiple observations for the same combination of ID YEAR and COURSE values?  If so then you do NOT want to use a data step merge, use an SQL join instead.

 

You cannot have periods in variable names.

IN= option sets the name of the VARIABLE that indicates it that dataset contributed to the observation.

You cannot use variables in a WHERE statement that are not on ALL of the input datasets.

 

Use a subsetting IF instead.

data Table3;
     merge Table1(in=T1) Table2(in=T2);
     by ID Year Course;
     if t1 and t2;
     if TermStartDate le StartDate le TermEndDate;
run;

 

twildone
Pyrite | Level 9
Hi Tom.....Yes the merging of the tables would be a many-to-many join. I did use SQL to join the tables and it did work but the processing time was long. I tried to use a Data Step with Merge (with and without T1. & T2.) and both times it wouldn't work so from your response it must be because TermStartDate, StartDate and TermEndDate are not in both tables. Thanks for the explanation.
Tom
Super User Tom
Super User

@twildone wrote:
Hi Tom.....Yes the merging of the tables would be a many-to-many join. I did use SQL to join the tables and it did work but the processing time was long. I tried to use a Data Step with Merge (with and without T1. & T2.) and both times it wouldn't work so from your response it must be because TermStartDate, StartDate and TermEndDate are not in both tables. Thanks for the explanation.

It might be possible to combine the data via a interleaving SET statement instead of MERGE, but it really depends on how many variables are you trying to carry over.

 

So perhaps something like this to attach VAR1 from table2 to observations from table1 where the date falls into the interval in table2 and rename it as VAR1_COPY.

data want;
   set table1(in=in1 rename=(startdate=date)) table2(in=in2 rename=(termstartdate=date termenddate=date2));
   by id year course date;
   if first.course then call misisng(termstartdate,termenddate,var1_copy);
   retain termstartdate termenddate var1_copy;
   format termstartdate termenddate date9.;
   if in2 then do;
      termstartdate=date;
      termenddate=date2;
      var1_copy = var1;
   end;
   if in1 and (termstartdate <= date <= termenddate);
   rename date=startdate;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 623 views
  • 0 likes
  • 4 in conversation