dataset 1:
ID | DATE |
1 | 10/13/2010 |
1 | 10/20/2010 |
1 | 10/27/2010 |
1 | 11/3/2010 |
2 | 1/31/2011 |
2 | 1/31/2011 |
2 | 2/2/2011 |
2 | 2/7/2011 |
2 | 2/7/2011 |
3 | 7/26/2011 |
3 | 7/26/2011 |
3 | 7/28/2011 |
3 | 8/1/2011 |
3 | 8/2/2011 |
3 | 8/2/2011 |
3 | 8/4/2011 |
3 | 8/8/2011 |
3 | 8/9/2011 |
dataset 2:
ID | Count | start | end |
1 | 1 | 10/18/2010 | 10/18/2010 |
2 | 1 | 2/1/2011 | 2/6/2011 |
3 | 1 | 7/27/2011 | 7/27/2011 |
3 | 3 | 8/2/2011 |
want:
ID | DATE | Count |
1 | 10/13/2010 | |
1 | 10/20/2010 | |
1 | 10/27/2010 | |
1 | 11/3/2010 | |
2 | 1/31/2011 | |
2 | 1/31/2011 | |
2 | 2/2/2011 | 1 |
2 | 2/7/2011 | |
2 | 2/7/2011 | |
3 | 7/26/2011 | |
3 | 7/26/2011 | |
3 | 7/28/2011 | |
3 | 8/1/2011 | |
3 | 8/2/2011 | 3 |
3 | 8/2/2011 | 3 |
3 | 8/4/2011 | 3 |
3 | 8/8/2011 | 3 |
3 | 8/9/2011 | 3 |
I want to merge the dataset1 and dataset2 by id and date, if the date in dataset1 falls in between start and end date of dataset2. in case if the end date is missing in the dataset2 then the count value should be retained across the rest of the observations in dataset1.
Any help to achieve this in sas?
Thanks
Do you have a field that's unique for each date range (in dataset2). I mean in the data below you gave has ID that looks like the key but is having same value (3) for two different date range ?
If I understand the question :
data d1;
input ID DATE :mmddyy.;
format date yymmdd10.;
datalines;
1 10/13/2010
1 10/20/2010
1 10/27/2010
1 11/3/2010
2 1/31/2011
2 1/31/2011
2 2/2/2011
2 2/7/2011
2 2/7/2011
3 7/26/2011
3 7/26/2011
3 7/28/2011
3 8/1/2011
3 8/2/2011
3 8/2/2011
3 8/4/2011
3 8/8/2011
3 8/9/2011
;
data d2;
input ID Count (start end) (:mmddyy10.);
format start end yymmdd10.;
datalines;
1 1 10/18/2010 10/18/2010
2 1 2/1/2011 2/6/2011
3 1 7/27/2011 7/27/2011
3 3 8/2/2011 .
;
data d3;
set d1;
n = _n_;
run;
proc sql;
create table want as
select d3.id, d3.date,
max(case when date >= start and date <= coalesce(end, '01jan2100'd)
then count
else . end) as count
from d3 left join d2 on d3.id=d2.id
group by n, d3.id, date;
select * from want;
quit;
You could use SUM instead of MAX if you want the sum of counts when multiple ranges match for a given id.
@PGStats: thanks. when i run it on the complete dataset the following error occurs.
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the
SELECT clause nor the optional HAVING clause of the associated table-expression
referenced a summary function
I have checked the date format of all variables in both input datasets but not sure why this error is occuring. Any ideas?
This warning indicates that your query didn't include a max or sum summary function. Please post the code that you ran.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.