I'm trying to merge two files which have start and end dates on each record. I want an output file which has all combinations of date spells. Where one file does not have data for a given range, I want to return missing values for its variables.
I almost have this working, but my code doesn't return the desired missing values. (The last record in the desired file in the code). Do I need to make this a left join? (I'm not very familiar with SQL).
data data1;
input start end v1;
if _n_>1 and start^=lag(end)+1 then error;
datalines;
1 2 1.12
3 4 1.34
5 5 1.55
6 7 1.67
8 8 1.88
;
run;
data data2;
input start end v2;
if _n_>1 and start^=lag(end)+1 then error;
datalines;
1 2 2.12
3 3 2.33
4 4 2.44
5 7 2.57
;
run;
* Desired merged file;
data desired;
input start end v1 v2;
datalines;
1 2 1.12 2.12
3 3 1.34 2.23
4 4 1.34 2.44
5 5 1.55 2.57
6 7 1.67 2.57
8 8 1.88 .
;
run;
* get list of time spells;
data long1;
set data1 data2;
keep start end;
run;
proc means data=long1 (keep=end) noprint nway;
class end;
output out=long2 (drop=_:) ;
run;
data spells;
set long2;
start = lag(end)+1;
if start=. then start=1;
run;
proc sql;
CREATE TABLE merged AS
SELECT s.start, s.end, v1, v2
FROM spells as s, data1 as d1, data2 as d2
where d1.start <= s.start and d1.end >= s.end and
d2.start <= s.start and d2.end >= s.end ;
;
QUIT ;
I think I've worked this out. Pretty straightforward once I read up on left joins. Required code is below (also there is one error in the 'desired' file. 2.23 should be 2.33).
proc sql;
CREATE TABLE merged AS
SELECT s.start, s.end, v1, v2
FROM spells as s
LEFT JOIN data1 as d1
ON d1.start <= s.start and d1.end >= s.end
LEFT JOIN data2 as d2
ON d2.start <= s.start and d2.end >= s.end ;
QUIT ;
I think I've worked this out. Pretty straightforward once I read up on left joins. Required code is below (also there is one error in the 'desired' file. 2.23 should be 2.33).
proc sql;
CREATE TABLE merged AS
SELECT s.start, s.end, v1, v2
FROM spells as s
LEFT JOIN data1 as d1
ON d1.start <= s.start and d1.end >= s.end
LEFT JOIN data2 as d2
ON d2.start <= s.start and d2.end >= s.end ;
QUIT ;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.