I'm trying to merge two files which have start and end dates on each record. I want an output file which has all combinations of date spells. Where one file does not have data for a given range, I want to return missing values for its variables.
I almost have this working, but my code doesn't return the desired missing values. (The last record in the desired file in the code). Do I need to make this a left join? (I'm not very familiar with SQL).
data data1;
input start end v1;
if _n_>1 and start^=lag(end)+1 then error;
datalines;
1 2 1.12
3 4 1.34
5 5 1.55
6 7 1.67
8 8 1.88
;
run;
data data2;
input start end v2;
if _n_>1 and start^=lag(end)+1 then error;
datalines;
1 2 2.12
3 3 2.33
4 4 2.44
5 7 2.57
;
run;
* Desired merged file;
data desired;
input start end v1 v2;
datalines;
1 2 1.12 2.12
3 3 1.34 2.23
4 4 1.34 2.44
5 5 1.55 2.57
6 7 1.67 2.57
8 8 1.88 .
;
run;
* get list of time spells;
data long1;
set data1 data2;
keep start end;
run;
proc means data=long1 (keep=end) noprint nway;
class end;
output out=long2 (drop=_:) ;
run;
data spells;
set long2;
start = lag(end)+1;
if start=. then start=1;
run;
proc sql;
CREATE TABLE merged AS
SELECT s.start, s.end, v1, v2
FROM spells as s, data1 as d1, data2 as d2
where d1.start <= s.start and d1.end >= s.end and
d2.start <= s.start and d2.end >= s.end ;
;
QUIT ;
I think I've worked this out. Pretty straightforward once I read up on left joins. Required code is below (also there is one error in the 'desired' file. 2.23 should be 2.33).
proc sql;
CREATE TABLE merged AS
SELECT s.start, s.end, v1, v2
FROM spells as s
LEFT JOIN data1 as d1
ON d1.start <= s.start and d1.end >= s.end
LEFT JOIN data2 as d2
ON d2.start <= s.start and d2.end >= s.end ;
QUIT ;
I think I've worked this out. Pretty straightforward once I read up on left joins. Required code is below (also there is one error in the 'desired' file. 2.23 should be 2.33).
proc sql;
CREATE TABLE merged AS
SELECT s.start, s.end, v1, v2
FROM spells as s
LEFT JOIN data1 as d1
ON d1.start <= s.start and d1.end >= s.end
LEFT JOIN data2 as d2
ON d2.start <= s.start and d2.end >= s.end ;
QUIT ;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.