dataset 1:
ID | DATE |
1 | 10/13/2010 |
1 | 10/20/2010 |
1 | 10/27/2010 |
1 | 11/3/2010 |
2 | 1/31/2011 |
2 | 1/31/2011 |
2 | 2/2/2011 |
2 | 2/7/2011 |
2 | 2/7/2011 |
3 | 7/26/2011 |
3 | 7/26/2011 |
3 | 7/28/2011 |
3 | 8/1/2011 |
3 | 8/2/2011 |
3 | 8/2/2011 |
3 | 8/4/2011 |
3 | 8/8/2011 |
3 | 8/9/2011 |
dataset 2:
ID | Count | start | end |
1 | 1 | 10/18/2010 | 10/18/2010 |
2 | 1 | 2/1/2011 | 2/6/2011 |
3 | 1 | 7/27/2011 | 7/27/2011 |
3 | 3 | 8/2/2011 |
want:
ID | DATE | Count |
1 | 10/13/2010 | |
1 | 10/20/2010 | |
1 | 10/27/2010 | |
1 | 11/3/2010 | |
2 | 1/31/2011 | |
2 | 1/31/2011 | |
2 | 2/2/2011 | 1 |
2 | 2/7/2011 | |
2 | 2/7/2011 | |
3 | 7/26/2011 | |
3 | 7/26/2011 | |
3 | 7/28/2011 | |
3 | 8/1/2011 | |
3 | 8/2/2011 | 3 |
3 | 8/2/2011 | 3 |
3 | 8/4/2011 | 3 |
3 | 8/8/2011 | 3 |
3 | 8/9/2011 | 3 |
I want to merge the dataset1 and dataset2 by id and date, if the date in dataset1 falls in between start and end date of dataset2. in case if the end date is missing in the dataset2 then the count value should be retained across the rest of the observations in dataset1.
Any help to achieve this in sas?
Thanks
Do you have a field that's unique for each date range (in dataset2). I mean in the data below you gave has ID that looks like the key but is having same value (3) for two different date range ?
If I understand the question :
data d1;
input ID DATE :mmddyy.;
format date yymmdd10.;
datalines;
1 10/13/2010
1 10/20/2010
1 10/27/2010
1 11/3/2010
2 1/31/2011
2 1/31/2011
2 2/2/2011
2 2/7/2011
2 2/7/2011
3 7/26/2011
3 7/26/2011
3 7/28/2011
3 8/1/2011
3 8/2/2011
3 8/2/2011
3 8/4/2011
3 8/8/2011
3 8/9/2011
;
data d2;
input ID Count (start end) (:mmddyy10.);
format start end yymmdd10.;
datalines;
1 1 10/18/2010 10/18/2010
2 1 2/1/2011 2/6/2011
3 1 7/27/2011 7/27/2011
3 3 8/2/2011 .
;
data d3;
set d1;
n = _n_;
run;
proc sql;
create table want as
select d3.id, d3.date,
max(case when date >= start and date <= coalesce(end, '01jan2100'd)
then count
else . end) as count
from d3 left join d2 on d3.id=d2.id
group by n, d3.id, date;
select * from want;
quit;
You could use SUM instead of MAX if you want the sum of counts when multiple ranges match for a given id.
@PGStats: thanks. when i run it on the complete dataset the following error occurs.
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the
SELECT clause nor the optional HAVING clause of the associated table-expression
referenced a summary function
I have checked the date format of all variables in both input datasets but not sure why this error is occuring. Any ideas?
This warning indicates that your query didn't include a max or sum summary function. Please post the code that you ran.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.