Hi....I wonder if what I am trying to do is possible using merge. I would like to merge Tables 1 & 2. The StartDate is only in Table2 and the TermStartDate and TermEndDate are only in Table1. When I run this code I get an error message. Thanks...
data Table3; merge Table1(in=T1) Table2(in=T2); by ID Year Course; where T1.TermStartDate le T2.StartDate le T1.TermEndDate; run;
Are you attempting to do a many-to-many join? That is: Is it possible for both TABLE1 and TABLE2 to have multiple observations for the same combination of ID YEAR and COURSE values? If so then you do NOT want to use a data step merge, use an SQL join instead.
You cannot have periods in variable names.
IN= option sets the name of the VARIABLE that indicates it that dataset contributed to the observation.
You cannot use variables in a WHERE statement that are not on ALL of the input datasets.
Use a subsetting IF instead.
data Table3;
merge Table1(in=T1) Table2(in=T2);
by ID Year Course;
if t1 and t2;
if TermStartDate le StartDate le TermEndDate;
run;
From now on, when you get an error message, please show us the ENTIRE log for this DATA step or PROC.
where T1.TermStartDate le T2.StartDate le T1.TermEndDate;
This is not valid data step syntax. There is no syntax in a data step like T1.variablename, variable names cannot have any additional text like T1. pre-pended to it.
I think what you want is this:
where TermStartDate le StartDate le TermEndDate;
This is better accomplished via SQL not a data step.
@Reeza wrote:
This is better accomplished via SQL not a data step.
I guess I'm not in agreement. It seems to me that either a DATA step or SQL will get the job done, I see no reason to prefer one over the other. Can you explain why you think this is better in SQL?
Are you attempting to do a many-to-many join? That is: Is it possible for both TABLE1 and TABLE2 to have multiple observations for the same combination of ID YEAR and COURSE values? If so then you do NOT want to use a data step merge, use an SQL join instead.
You cannot have periods in variable names.
IN= option sets the name of the VARIABLE that indicates it that dataset contributed to the observation.
You cannot use variables in a WHERE statement that are not on ALL of the input datasets.
Use a subsetting IF instead.
data Table3;
merge Table1(in=T1) Table2(in=T2);
by ID Year Course;
if t1 and t2;
if TermStartDate le StartDate le TermEndDate;
run;
@twildone wrote:
Hi Tom.....Yes the merging of the tables would be a many-to-many join. I did use SQL to join the tables and it did work but the processing time was long. I tried to use a Data Step with Merge (with and without T1. & T2.) and both times it wouldn't work so from your response it must be because TermStartDate, StartDate and TermEndDate are not in both tables. Thanks for the explanation.
It might be possible to combine the data via a interleaving SET statement instead of MERGE, but it really depends on how many variables are you trying to carry over.
So perhaps something like this to attach VAR1 from table2 to observations from table1 where the date falls into the interval in table2 and rename it as VAR1_COPY.
data want;
set table1(in=in1 rename=(startdate=date)) table2(in=in2 rename=(termstartdate=date termenddate=date2));
by id year course date;
if first.course then call misisng(termstartdate,termenddate,var1_copy);
retain termstartdate termenddate var1_copy;
format termstartdate termenddate date9.;
if in2 then do;
termstartdate=date;
termenddate=date2;
var1_copy = var1;
end;
if in1 and (termstartdate <= date <= termenddate);
rename date=startdate;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.