- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to select data based off of date and id from another table and summing a third column. Hopefully the below illustrates what I'm trying to do.
data reference_table;
infile datalines dlm='#';
input id & $1. date date7. product $ dollars;
format date ddmmyy.;
datalines;
1# 01JAN17# shoe# 200
1# 01Mar18# shoe# 200
1# 01Sep15# shoe# 200
2# 01Feb19# shoe# 200
2# 01Jul18# shoe# 200
2# 01Jun17# tops# 200
3# 01Aug14# tops# 300
3# 01Dec13# tape# 200
4# 01May17# tape# 200
4# 01Sep17# limo# 800
5# 01Apr16# hats# 500
5# 01Oct18# pens# 500
;
run;
data new_month_ids;
infile datalines dlm='#';
input id & $1. date date7.;
format date ddmmyy.;
datalines;
1# '01Jan17'd
2# '13Feb19'd
3# '28Dec13'd
4# '27Jun17'd
;
run;
From these two tables I am trying to do a query involving both which selects all the id's that appear in the "new_month_ids" table, and sum all of the dollars from the reference table that occurred within two years of the date in the "new_month_ids" table by both id and product. I believe the below table is what the final result should look like.
data want;
infile datalines dlm='#';
input id & $1. product $ dollars;
datalines;
1# shoe# 400
2# shoe# 200
2# tops# 200
3# tape# 200
4# tape# 200
4# limo# 800
;
run;
Notice for example, for id 1 we only sum two of the lines because those are the only two that appear within the two year time frame 01Jan2015-01Jan2017.
Below is my attempt at this, but when I run this code it returns no values. I have a feeling this might be related to my date format in the intnx function but I haven't figured it out yet.
proc sql;
create table test as
select a.id,
a.product,
sum(a.dollars) as dollars
from reference_table a left join new_month_ids b on
(a.id=b.id)
where a.id in (select distinct b.id from new_month_ids b)
And
intnx('year',b.date,-2,'same')<=a.date<=b.date
group by a.id,
a.product
;
quit;
Any help is appreciated.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If that is actually how you built the NEW_MONTH_IDS data set then that code is the major problem.
When I run
proc print data=new_month_ids; var date; format date date9.; run;
The result is
Obs date 1 01JAN2001 2 13FEB2001 3 28DEC2001 4 27JUN2001
Why did you have the date literal format?
Try
data new_month_ids; infile datalines ; input id : $1. date :date7.; format date ddmmyy.; datalines; 1 01Jan17 2 13Feb19 3 28Dec13 4 27Jun17 ; run;
After fixing that data set so the dates are correct I get 4 records and a message about invalid arguments to INTNX because the
NEW_MONTH_IDS data set does not have all the IDs that are in the reference set.
Your data does not support getting all of the desired output.
4# 01Sep17# limo# 800
does not occur in the interval for 4 in the months set : 4 27Jun17 as Sep17 comes AFTER Jun 17 so that can't pull the 4 limo 800 into the result
This gets closer to the output desired barring the date issue above
proc sql; create table test as select a.id, a.product, sum(a.dollars) as dollars from reference_table a left join new_month_ids b on (a.id=b.id) where intnx('year',b.date,-2,'same')<= a.date <= b.date group by a.id, a.product ; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If that is actually how you built the NEW_MONTH_IDS data set then that code is the major problem.
When I run
proc print data=new_month_ids; var date; format date date9.; run;
The result is
Obs date 1 01JAN2001 2 13FEB2001 3 28DEC2001 4 27JUN2001
Why did you have the date literal format?
Try
data new_month_ids; infile datalines ; input id : $1. date :date7.; format date ddmmyy.; datalines; 1 01Jan17 2 13Feb19 3 28Dec13 4 27Jun17 ; run;
After fixing that data set so the dates are correct I get 4 records and a message about invalid arguments to INTNX because the
NEW_MONTH_IDS data set does not have all the IDs that are in the reference set.
Your data does not support getting all of the desired output.
4# 01Sep17# limo# 800
does not occur in the interval for 4 in the months set : 4 27Jun17 as Sep17 comes AFTER Jun 17 so that can't pull the 4 limo 800 into the result
This gets closer to the output desired barring the date issue above
proc sql; create table test as select a.id, a.product, sum(a.dollars) as dollars from reference_table a left join new_month_ids b on (a.id=b.id) where intnx('year',b.date,-2,'same')<= a.date <= b.date group by a.id, a.product ; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content