DATA Step, Macro, Functions and more

Merge two datasets by ID and date range

Reply
Frequent Contributor
Frequent Contributor
Posts: 103

Merge two datasets by ID and date range

dataset 1:

IDDATE
110/13/2010
110/20/2010
110/27/2010
111/3/2010
21/31/2011
21/31/2011
22/2/2011
22/7/2011
22/7/2011
37/26/2011
37/26/2011
37/28/2011
38/1/2011
38/2/2011
38/2/2011
38/4/2011
38/8/2011
38/9/2011

 

dataset 2:

IDCountstart end
1110/18/201010/18/2010
212/1/20112/6/2011
317/27/20117/27/2011
338/2/2011 

 

want:

IDDATECount
110/13/2010 
110/20/2010 
110/27/2010 
111/3/2010 
21/31/2011 
21/31/2011 
22/2/20111
22/7/2011 
22/7/2011 
37/26/2011 
37/26/2011 
37/28/2011 
38/1/2011 
38/2/20113
38/2/20113
38/4/20113
38/8/20113
38/9/20113

 

I want to merge the dataset1 and dataset2 by id and date, if the date in dataset1 falls in between start and end date of dataset2. in case if the end date is missing in the dataset2 then the count value should be retained across the rest of the observations in dataset1.

Any help to achieve this in sas?

Thanks

Contributor
Posts: 44

Re: Merge two datasets by ID and date range

Do you have a field that's unique for each date range (in dataset2). I mean in the data below you gave has ID that looks like the key but is having same value (3) for two different date range ?

Respected Advisor
Posts: 4,925

Re: Merge two datasets by ID and date range

[ Edited ]

If I understand the question :

 

data d1;
input ID	DATE :mmddyy.;
format date yymmdd10.;
datalines;
1	10/13/2010
1	10/20/2010
1	10/27/2010
1	11/3/2010
2	1/31/2011
2	1/31/2011
2	2/2/2011
2	2/7/2011
2	2/7/2011
3	7/26/2011
3	7/26/2011
3	7/28/2011
3	8/1/2011
3	8/2/2011
3	8/2/2011
3	8/4/2011
3	8/8/2011
3	8/9/2011
;

data d2;
input ID	Count	(start 	end) (:mmddyy10.);
format start end yymmdd10.;
datalines;
1	1	10/18/2010	10/18/2010
2	1	2/1/2011	2/6/2011
3	1	7/27/2011	7/27/2011
3	3	8/2/2011	 .
;

data d3;
set d1;
n = _n_;
run;

proc sql;
create table want as
select d3.id, d3.date, 
    max(case when date >= start and date <= coalesce(end, '01jan2100'd) 
        then count
        else . end) as count
from d3 left join d2 on d3.id=d2.id
group by n, d3.id, date;
select * from want;
quit;

You could use SUM instead of MAX if you want the sum of counts when multiple ranges match for a given id. 

PG
Frequent Contributor
Frequent Contributor
Posts: 103

Re: Merge two datasets by ID and date range

[ Edited ]

@PGStats: thanks. when i run it on the complete dataset the following error occurs.

WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the
SELECT clause nor the optional HAVING clause of the associated table-expression
referenced a summary function

 

I have checked the date format of all variables in both input datasets but  not sure why  this error is occuring. Any ideas?

Respected Advisor
Posts: 4,925

Re: Merge two datasets by ID and date range

This warning indicates that your query didn't include a max or sum summary function. Please post the code that you ran. 

PG
Ask a Question
Discussion stats
  • 4 replies
  • 340 views
  • 0 likes
  • 3 in conversation