I have a table of the form:
Customer ID | Transaction Date for Buy |
1 | 2016-04-15 |
1 | 2016-04-20 |
3 | 2016-04-29 |
3 | 2016-04-21 |
3 | 2016-04-25 |
4 | 2017-12-17 |
From this data, how would I get a summary of the type:
Customer ID | Transaction Date for Buy | Number of buys in april 2016 | Number of buys in december 2017 |
1 | 2016-04-15 | 2 | 0 |
1 | 2016-04-20 | 2 | 0 |
3 | 2016-04-29 | 3 | 0 |
3 | 2016-04-21 | 3 | 0 |
3 | 2016-04-25 | 3 | 0 |
4 | 2017-12-17 | 0 | 1 |
So I would like to get the number of buys for each month uptill to today, starting at say january 2021. So there will be very many columns in my real example.
Any advice?
data have;
infile cards expandtabs;
input ID Date :yymmdd12.;
format Date yymmdd10.;
cards;
1 2016-04-15
1 2016-04-20
3 2016-04-29
3 2016-04-21
3 2016-04-25
4 2017-12-17
;
proc freq data=have noprint;
table id*date/out=want list;
format date yymmn6.;
run;
data have;
infile cards expandtabs;
input ID Date :yymmdd12.;
format Date yymmdd10.;
cards;
1 2016-04-15
1 2016-04-20
3 2016-04-29
3 2016-04-21
3 2016-04-25
4 2017-12-17
;
proc freq data=have noprint;
table id*date/out=want list;
format date yymmn6.;
run;
Hello @SasStatistics
The following code does what you need
data have;
infile cards expandtabs;
input ID Date :yymmdd12.;
format Date yymmdd10.;
cards;
1 2016-04-15
1 2016-04-20
3 2016-04-29
3 2016-04-21
3 2016-04-25
4 2017-12-17
;
proc sql;
create table have2 as
select id,date, year(date) as Year, count(id) as count
from have
group by year(date), id
order by id, date;
run;
proc transpose data=have2 out=want (drop= _NAME_) ;
by id date;
id year;
var count;
run;
data want;
set want;
array change _numeric_;
do over change;
if change=. then change=0;
end;
run ;
The out put will be as you wanted.
This can be done in a single data step, by reading each ID twice, once to get the counts, second time to output the original data with those counts.
data have;
infile cards expandtabs;
input ID Date :yymmdd12.;
format Date yymmdd10.;
cards;
1 2016-04-15
1 2016-04-20
3 2016-04-29
3 2016-04-21
3 2016-04-25
4 2017-12-17
;
data want (drop=_: nxt_:);
do until (id^=nxt_id);
do _ntrans=1 by 1 until (intck('month',date,nxt_date)^=0 or id^=nxt_id);
merge have
have (firstobs=2 keep=id date rename=(id=nxt_id date=nxt_date));
array mnths{4:12} april_trans _unwanted5-_unwanted11 dec_trans;
end;
mnths{month(date)}=_ntrans;
end;
do _m=4,12; if mnths{_m}=. then mnths{_m}=0; end;
do until (last.id);
set have;
by id notsorted;
output;
end;
run;
This program assumes that the data are grouped by id and within id, grouped by month. The ID's (and the months within ID's) need not be in ascending order. The technique in determining month and/or id boundaries is in the merge statement, using the "firstobs=2" parameter for one of the merged data sequences.
It uses the ARRAY statement to index the array MNTHS by month number. Since you only want April and December, the array has lower index of 4 and upper index of 12. And it has a bunch of unwanted variables as placeholders for May through November. If you want other months just make changes to the variable names and/or array index range. And modify the "do _m=4,12;" loop accordingly. It wouldn't be hard to use the array for months ranging over multiple years - just add a second dimension for year.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.