I am trying to select data based off of date and id from another table and summing a third column. Hopefully the below illustrates what I'm trying to do.
data reference_table;
infile datalines dlm='#';
input id & $1. date date7. product $ dollars;
format date ddmmyy.;
datalines;
1# 01JAN17# shoe# 200
1# 01Mar18# shoe# 200
1# 01Sep15# shoe# 200
2# 01Feb19# shoe# 200
2# 01Jul18# shoe# 200
2# 01Jun17# tops# 200
3# 01Aug14# tops# 300
3# 01Dec13# tape# 200
4# 01May17# tape# 200
4# 01Sep17# limo# 800
5# 01Apr16# hats# 500
5# 01Oct18# pens# 500
;
run;
data new_month_ids;
infile datalines dlm='#';
input id & $1. date date7.;
format date ddmmyy.;
datalines;
1# '01Jan17'd
2# '13Feb19'd
3# '28Dec13'd
4# '27Jun17'd
;
run;
From these two tables I am trying to do a query involving both which selects all the id's that appear in the "new_month_ids" table, and sum all of the dollars from the reference table that occurred within two years of the date in the "new_month_ids" table by both id and product. I believe the below table is what the final result should look like.
data want;
infile datalines dlm='#';
input id & $1. product $ dollars;
datalines;
1# shoe# 400
2# shoe# 200
2# tops# 200
3# tape# 200
4# tape# 200
4# limo# 800
;
run;
Notice for example, for id 1 we only sum two of the lines because those are the only two that appear within the two year time frame 01Jan2015-01Jan2017.
Below is my attempt at this, but when I run this code it returns no values. I have a feeling this might be related to my date format in the intnx function but I haven't figured it out yet.
proc sql;
create table test as
select a.id,
a.product,
sum(a.dollars) as dollars
from reference_table a left join new_month_ids b on
(a.id=b.id)
where a.id in (select distinct b.id from new_month_ids b)
And
intnx('year',b.date,-2,'same')<=a.date<=b.date
group by a.id,
a.product
;
quit;
Any help is appreciated.
If that is actually how you built the NEW_MONTH_IDS data set then that code is the major problem.
When I run
proc print data=new_month_ids; var date; format date date9.; run;
The result is
Obs date 1 01JAN2001 2 13FEB2001 3 28DEC2001 4 27JUN2001
Why did you have the date literal format?
Try
data new_month_ids; infile datalines ; input id : $1. date :date7.; format date ddmmyy.; datalines; 1 01Jan17 2 13Feb19 3 28Dec13 4 27Jun17 ; run;
After fixing that data set so the dates are correct I get 4 records and a message about invalid arguments to INTNX because the
NEW_MONTH_IDS data set does not have all the IDs that are in the reference set.
Your data does not support getting all of the desired output.
4# 01Sep17# limo# 800
does not occur in the interval for 4 in the months set : 4 27Jun17 as Sep17 comes AFTER Jun 17 so that can't pull the 4 limo 800 into the result
This gets closer to the output desired barring the date issue above
proc sql; create table test as select a.id, a.product, sum(a.dollars) as dollars from reference_table a left join new_month_ids b on (a.id=b.id) where intnx('year',b.date,-2,'same')<= a.date <= b.date group by a.id, a.product ; quit;
If that is actually how you built the NEW_MONTH_IDS data set then that code is the major problem.
When I run
proc print data=new_month_ids; var date; format date date9.; run;
The result is
Obs date 1 01JAN2001 2 13FEB2001 3 28DEC2001 4 27JUN2001
Why did you have the date literal format?
Try
data new_month_ids; infile datalines ; input id : $1. date :date7.; format date ddmmyy.; datalines; 1 01Jan17 2 13Feb19 3 28Dec13 4 27Jun17 ; run;
After fixing that data set so the dates are correct I get 4 records and a message about invalid arguments to INTNX because the
NEW_MONTH_IDS data set does not have all the IDs that are in the reference set.
Your data does not support getting all of the desired output.
4# 01Sep17# limo# 800
does not occur in the interval for 4 in the months set : 4 27Jun17 as Sep17 comes AFTER Jun 17 so that can't pull the 4 limo 800 into the result
This gets closer to the output desired barring the date issue above
proc sql; create table test as select a.id, a.product, sum(a.dollars) as dollars from reference_table a left join new_month_ids b on (a.id=b.id) where intnx('year',b.date,-2,'same')<= a.date <= b.date group by a.id, a.product ; quit;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.