Hello!
I'm currently using the SAS Guide 8.1 for an analysis, and I want some help for creating a specific table.
My initial dataset is the sales of my business, which contains these main informations:
I want to make a table that counts how many customers have made a purchase on May/2020 and June/2020. Then on June/2020 and July/2020, and so on...
The objective is to see my level of fidelization, creating a retention KPI (how many customers that bought this month were here last month too?)
I think that the best way to create this calculation is programming a loop-like code, but I don't have the knowledge for that yet.
If anyone needs additional information, please, let me know.
Thanks!
I think to get started it would help if you posted a picture of what you want this table to look like in the end after you summarize it. It would also be good to clarify how you want to count the customers with gaps in their transactions. What if they purchased in May and July but not June. Does that count as a repeat customer to you?
Hi, Nicole.
Thanks for answering!
So, the table would look like this:
The date columns could be in date format.
The distinctActiveCustomer column is the number of customers that made at least 1 purchase on that month. The retainedCustomers are the same, but with the addition that they bought on last month too.
About your question, it has to be on two sequential months. If anyone "leaps" a month, they wouldn't be counted.
I'm here if you need anymore details.
Best regards,
Renan
If the data is sorted by YEARMONTH/ID, then you can read the data in groups. Each time a unique ID is encountered, add it to a hash table for the currentmonth and increment the number of distinct current id's. If the ID in hand is also present in the hash table for the preceding month, then add to the retained customer count.
At the last record for a given yearmonth, output the counts, clear the hash table representing the previous month, and reuse it for the upcoming current month (the hash table for the current month then become the object holding the newly prior-month data).
Here is untested code that does that. I repeat, this assumes that data set HAVE is sorted by YEARMONTH/ID:
data want (keep=yearmonth distinctactivecustomer retainedcustomers);
set have;
by yearmonth id;
if first.yearmonth then call missing(distinctactivecustomer,retainedcustomers);
if _n_=1 then do;
declare hash mntha ();
mntha.definekey('id');
mntha.definedone();
declare hash mnthb ();
mnthb.definekey('id');
mnthb.definedone();
end;
retain _curm 'MNTHA' ; /*Indicator of which hash is for current month*/
if last.id; /*Process the id in hand once */
if _curm='MNTHA' then do;
mntha.add();
if mnthb.chk()=0 then retainedcustomers+1;
end;
else do;
mnthb.add();
if mntha.chk()=0 then retainedcustomers+1;
end;
if last.yearmonth;
/* Alternate identity of the current month hash object */
if _curm='MNTHA' then _curm='MNTHB'; else
if _curm='MNTHB' then _curm='MNTHA';
/* Now clear the "current" month hash, in preparation for next month */
if _curm='A' then mntha.clear(); else
if _curm='B' then mnthb.clear();
run;
If the data are sorted by yearmonth (but not sorted by id within yearmonth), then this minor modified code should do. The "if last.id;" statement is deleted, and the 2 statements using the hash add methods (mntha.add() and mnthb.add()) are embedded in IF conditions (i.e. don't attempt to add an id to a hash if it is already there).
data want (keep=yearmonth distinctactivecustomer retainedcustomers);
set have;
by yearmonth id;
if first.yearmonth then call missing(distinctactivecustomer,retainedcustomers);
if _n_=1 then do;
declare hash mntha ();
mntha.definekey('id');
mntha.definedone();
declare hash mnthb ();
mnthb.definekey('id');
mnthb.definedone();
end;
retain _curm 'MNTHA' ; /*Indicator of which hash is for current month*/
if _curm='MNTHA' then do;
if mntha.check()^=0 then mntha.add(); /*Modified */
if mnthb.chk()=0 then retainedcustomers+1;
end;
else do;
if mnthb.check()=0 then mnthb.add(); /*Modified */
if mntha.chk()=0 then retainedcustomers+1;
end;
if last.yearmonth;
/* Alternate identity of the current month hash object */
if _curm='MNTHA' then _curm='MNTHB'; else
if _curm='MNTHB' then _curm='MNTHA';
/* Now clear the "current" month hash, in preparation for next month */
if _curm='A' then mntha.clear(); else
if _curm='B' then mnthb.clear();
run;
If you sort the data by id/yearmonth you don't need hash tables to check an id against the previous month to determine retention. Instead, you can check the current record against the prior record to see if they are exactly one month apart, demonstrating retention for that id and month:
Editted note: I've corrected the erroneous spelliing of the macrovar's beg_yyyymm and end_yyyymm. But this program is still not tested.
%*let beg_yyyyymm=201801; /* Erroneous initial code, commented out*/
%*let end_yyyyymm=202006; /* Erroneous initial code, commented out*/
%let beg_yyyymm=201801; /* Corrected replacement code*/
%let end_yyyymm=202006; /* Corrected replacement code*/
%let beg_yyyy=%substr(&beg_yyyymm,1,4);
%let end_yyyy=%substr(&end_yyyymm,1,4);
%let beg_mm=%substr(&beg_yyyymm,5,2);
%let end_mm=%substr(&end_yyyymm,5,2);
proc sort data=have out=need;
by id yearmonth;
run;
data want;
/* These arrays indexed by year (rows) and months (columns) */
array curcounts {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
array retained {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
set need end=end_of_data;
by id yearmonth;
if last.yearmonth; /* Let only one obs per id/yearmonth pass through */
/* First, get row and column array indexes for the current record */
_yyyy=floor(yearmonth/100);
_mm=mod(yearmonth,100);
curcounts{_yyyy,_mm}+1;
_curdate=mdy(_mm,1,_yyyy);
/* Now see about if this record is preceded by a record from the "retain" yearmonth */
if intck('month',lag(_curdate),_curdate)=1 and lag(id)=id then retained{_yyyy,_mm}+1;
if end_of_data;
do _yyyy=&beg_yyyy to &end_yyyy ;
do _mm=ifn(_yyyy=&beg_yyyy,&beg_mm,1) to ifn(_yyyy=&end_yyyy,&end_mm,12);
if curcounts{_yyyy,_mm}^=. then do;
yearmonth=100*_yyyy+_mm;
distinctactivecustomer =curcounts{_yyyy,_mm};
retainedcustomers=retained{_yyyy,_mm};
output;
end;
end;
end;
run;
The program above assumes that you know the range of range is from 201801 through 202006.
Hello, @mkeintz! Thanks for replying.
I've tried your last solution (by sorting the date by id/yearmonth), but it returned some errors.
I've attached the log file to make it easier to find out what's happening. I'm sorry to ask for your help again, but I'm very newbie to SAS Programming.
Could you help me? If you need any other information, please, let me know.
Regards!
I see an error in my code, which I will correct in the original response.
This is one of the problems when sample data is not provided - I have nothing to test the program against. In this case a typographical error got detected by SAS while undetected by me. It's the major reason that you will constantly see "please provide data in the form of a data step" notes from those who want to help you.
Also ..., while I'm in my lecture mode, it would be good in the future if you were to paste the log into the "insert code" popup (activated by the </> icon) as I've done below. Then every person interested can easily see what we are talking about.
Note the log has the warning:
WARNING: Apparent symbolic reference BEG_YYYYMM not resolved.
And prior to that I had coded a macrovariable assignment:
29 %let beg_yyyyymm=201901;
which mistakenly had 5 y's. It should have been 4 y's, since that's what I used later in the program. I'll make that correction in my original response.
It a good example to show that the sas log is your friend. It will be beneficial to understand the reason for every note, warning, and error message posted by sas to the log.
Log text for erroneous program:
1 The SAS System 10:25 Wednesday, August 12, 2020 1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET SYSLAST=WORK.COOPERADO_ANOMES; 4 %LET _CLIENTTASKLABEL='Program'; 5 %LET _CLIENTPROCESSFLOWNAME='Process Flow'; 6 %LET _CLIENTPROJECTPATH=''; 7 %LET _CLIENTPROJECTPATHHOST=''; 8 %LET _CLIENTPROJECTNAME=''; 9 %LET _SASPROGRAMFILE=''; 10 %LET _SASPROGRAMFILEHOST=''; 11 12 ODS _ALL_ CLOSE; 13 OPTIONS DEV=SVG; 14 GOPTIONS XPIXELS=0 YPIXELS=0; 15 %macro HTML5AccessibleGraphSupported; 16 %if %_SAS_VERCOMP(9, 4, 4) >= 0 %then ACCESSIBLE_GRAPH; 17 %mend; 18 FILENAME EGHTML TEMP; 19 ODS HTML5(ID=EGHTML) FILE=EGHTML 20 OPTIONS(BITMAP_MODE='INLINE') 21 %HTML5AccessibleGraphSupported 22 ENCODING='utf-8' 23 STYLE=HTMLBlue 24 NOGTITLE 25 NOGFOOTNOTE 26 GPATH=&sasworklocation 27 ; NOTE: Writing HTML5(EGHTML) Body file: EGHTML 28 29 %let beg_yyyyymm=201901; 30 %let end_yyyyymm=202007; 31 32 %let beg_yyyy=%substr(&beg_yyyymm,1,4); WARNING: Apparent symbolic reference BEG_YYYYMM not resolved. WARNING: Apparent symbolic reference BEG not resolved. 33 %let end_yyyy=%substr(&end_yyyymm,1,4); WARNING: Apparent symbolic reference END_YYYYMM not resolved. WARNING: Apparent symbolic reference END not resolved. 34 %let beg_mm=%substr(&beg_yyyymm,5,2); WARNING: Apparent symbolic reference BEG_YYYYMM not resolved. 35 %let end_mm=%substr(&end_yyyymm,5,2); WARNING: Apparent symbolic reference END_YYYYMM not resolved. 36 37 proc sort data=WORK.cooperado_anomes out=WORK.COOP_ANOMES_SORT; 38 by NUM_MAT NUM_ANO_MES; 39 run; NOTE: There were 9039470 observations read from the data set WORK.COOPERADO_ANOMES. NOTE: The data set WORK.COOP_ANOMES_SORT has 9039470 observations and 2 variables. NOTE: PROCEDURE SORT used (Total process time): real time 1.51 seconds cpu time 3.62 seconds 40 41 data WORK.COOP_RETIDOS_ANOMES; 42 /* These arrays indexed by year (rows) and months (columns) */ WARNING: Apparent symbolic reference BEG not resolved. 2 The SAS System 10:25 Wednesday, August 12, 2020 43 array curcounts {&beg_yyyy:&end_yyyy,1:12} _temporary_ (); NOTE: Line generated by the macro variable "BEG_YYYY". 43 &beg _ 22 200 ___ 22 76 WARNING: Apparent symbolic reference END not resolved. WARNING: Apparent symbolic reference BEG not resolved. ERROR 22-322: Syntax error, expecting one of the following: an integer constant, *. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored. 44 array retained {&beg_yyyy:&end_yyyy,1:12} _temporary_ (); NOTE: Line generated by the macro variable "BEG_YYYY". 44 &beg _ 22 200 ___ 22 76 WARNING: Apparent symbolic reference END not resolved. ERROR 22-322: Syntax error, expecting one of the following: an integer constant, *. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored. 45 set WORK.COOP_ANOMES_SORT end=end_of_data; 46 47 by NUM_MAT NUM_ANO_MES; 48 if last.NUM_ANO_MES; /* Let only one obs per id/yearmonth pass through */ 49 50 /* First, get row and column array indexes for the current record */ 51 _yyyy=floor(NUM_ANO_MES/100); 52 _mm=mod(NUM_ANO_MES,100); 53 54 curcounts{_yyyy,_mm}+1; _ 400 ERROR: Too many array subscripts specified for array curcounts. ERROR 400-185: The SUM statement requires numeric expression. 55 56 _curdate=mdy(_mm,1,_yyyy); 57 /* Now see about if this record is preceded by a record from the "retain" yearmonth */ 58 if intck('month',lag(_curdate),_curdate)=1 and lag(NUM_MAT)=NUM_MAT then retained{_yyyy,_mm}+1; _ 400 ERROR: Too many array subscripts specified for array retained. ERROR 400-185: The SUM statement requires numeric expression. 59 3 The SAS System 10:25 Wednesday, August 12, 2020 60 if end_of_data; 61 do _yyyy=&beg_yyyy to &end_yyyy ; NOTE: Line generated by the macro variable "BEG_YYYY". 61 &beg _ 22 WARNING: Apparent symbolic reference BEG not resolved. NOTE: Line generated by the macro variable "END_YYYY". 61 &end _ 22 WARNING: Apparent symbolic reference END not resolved. ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant, a missing value, INPUT, PUT. 62 do _mm=ifn(_yyyy=&beg_yyyy,&beg_mm,1) to ifn(_yyyy=&end_yyyy,&end_mm,12); NOTE: Line generated by the macro variable "BEG_YYYY". 62 &beg _ 22 WARNING: Apparent symbolic reference BEG not resolved. NOTE: Line generated by the macro variable "END_YYYY". 62 &end _ 22 WARNING: Apparent symbolic reference END not resolved. ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant, a missing value, bitstring, INPUT, PUT. 63 if curcounts{_yyyy,_mm}^=. then do; ERROR: Too many array subscripts specified for array curcounts. 64 NUM_ANO_MES=100*_yyyy+_mm; 65 distinctactivecustomer =curcounts{_yyyy,_mm}; ERROR: Too many array subscripts specified for array curcounts. 66 retainedcustomers=retained{_yyyy,_mm}; ERROR: Too many array subscripts specified for array retained. 67 output; 68 end; 69 end; 70 end; 71 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 63:7 NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.COOP_RETIDOS_ANOMES may be incomplete. When this step was stopped there were 0 observations and 10 variables. WARNING: Data set WORK.COOP_RETIDOS_ANOMES was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 72 73 %LET _CLIENTTASKLABEL=; 74 %LET _CLIENTPROCESSFLOWNAME=; 75 %LET _CLIENTPROJECTPATH=; 76 %LET _CLIENTPROJECTPATHHOST=; 4 The SAS System 10:25 Wednesday, August 12, 2020 77 %LET _CLIENTPROJECTNAME=; 78 %LET _SASPROGRAMFILE=; 79 %LET _SASPROGRAMFILEHOST=; 80 81 ;*';*";*/;quit;run; 82 ODS _ALL_ CLOSE; 83 84 85 QUIT; RUN; 86
Hi, @mkeintz. Thanks for the tips! I'm kinda new to the programming world and these kind of advices are really useful.
I've made the corrections you pointed out, but it still giving an error. Now it's related to the temporary function according to the log below.
I've extracted a random sample from my data (before sorting) and attached it to this reply to make it easier.
Thanks again for helping!
1 The SAS System 10:25 Wednesday, August 12, 2020
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET SYSLAST=WORK.COOPERADO_ANOMES;
4 %LET _CLIENTTASKLABEL='Program';
5 %LET _CLIENTPROCESSFLOWNAME='Process Flow';
6 %LET
6 ! _CLIENTPROJECTPATH='\\10.200.12.26\gcl\ANÁLISES\2020\202008_Exploratória_Retenção_Cooperados\Retencao_Cooperados.egp';
7 %LET _CLIENTPROJECTPATHHOST='200SW12GTI140';
8 %LET _CLIENTPROJECTNAME='Retencao_Cooperados.egp';
9 %LET _SASPROGRAMFILE='';
10 %LET _SASPROGRAMFILEHOST='';
11
12 ODS _ALL_ CLOSE;
13 OPTIONS DEV=SVG;
14 GOPTIONS XPIXELS=0 YPIXELS=0;
15 %macro HTML5AccessibleGraphSupported;
16 %if %_SAS_VERCOMP(9, 4, 4) >= 0 %then ACCESSIBLE_GRAPH;
17 %mend;
18 FILENAME EGHTML TEMP;
19 ODS HTML5(ID=EGHTML) FILE=EGHTML
20 OPTIONS(BITMAP_MODE='INLINE')
21 %HTML5AccessibleGraphSupported
22 ENCODING='utf-8'
23 STYLE=HTMLBlue
24 NOGTITLE
25 NOGFOOTNOTE
26 GPATH=&sasworklocation
27 ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
28
29 %*let beg_yyyyymm=201801; /* Erroneous initial code, commented out*/
30 %*let end_yyyyymm=202006; /* Erroneous initial code, commented out*/
31
32 %let beg_yyyymm=201901; /* Corrected replacement code*/
33 %let end_yyyymm=202007; /* Corrected replacement code*/
34
35 %let beg_yyyy=%substr(&beg_yyyymm,1,4);
36 %let end_yyyy=%substr(&end_yyyymm,1,4);
37 %let beg_mm=%substr(&beg_yyyymm,5,2);
38 %let end_mm=%substr(&end_yyyymm,5,2);
39
40 proc sort data=WORK.COOPERADO_ANOMES out=WORK.COOP_ANOMES_SORT;
41 by NUM_MAT NUM_ANO_MES;
42 run;
NOTE: There were 9039470 observations read from the data set WORK.COOPERADO_ANOMES.
NOTE: The data set WORK.COOP_ANOMES_SORT has 9039470 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 1.33 seconds
cpu time 3.36 seconds
43
44 data WORK.COOP_ANOMES_RETENCAO;
45 /* These arrays indexed by year (rows) and months (columns) */
46 array curcounts {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
_
22
2 The SAS System 10:25 Wednesday, August 12, 2020
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value.
47 array retained {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
WARNING: Partial value initialization of the array curcounts.
47 array retained {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
_
22
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value.
48 set WORK.COOP_ANOMES_SORT end=end_of_data;
WARNING: Partial value initialization of the array retained.
49
50 by NUM_MAT NUM_ANO_MES;
51 if last.NUM_ANO_MES; /* Let only one obs per id/yearmonth pass through */
52
53 /* First, get row and column array indexes for the current record */
54 _yyyy=floor(NUM_ANO_MES/100);
55 _mm=mod(NUM_ANO_MES,100);
56
57 curcounts{_yyyy,_mm}+1;
58
59 _curdate=mdy(_mm,1,_yyyy);
60 /* Now see about if this record is preceded by a record from the "retain" yearmonth */
61 if intck('month',lag(_curdate),_curdate)=1 and lag(NUM_MAT)=NUM_MAT then retained{_yyyy,_mm}+1;
62
63 if end_of_data;
64 do _yyyy=&beg_yyyy to &end_yyyy ;
65 do _mm=ifn(_yyyy=&beg_yyyy,&beg_mm,1) to ifn(_yyyy=&end_yyyy,&end_mm,12);
66 if curcounts{_yyyy,_mm}^=. then do;
67 NUM_ANO_MES=100*_yyyy+_mm;
68 distinctactivecustomer =curcounts{_yyyy,_mm};
69 retainedcustomers=retained{_yyyy,_mm};
70 output;
71 end;
72 end;
73 end;
74 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.COOP_ANOMES_RETENCAO may be incomplete. When this step was stopped there were 0 observations and 7
variables.
WARNING: Data set WORK.COOP_ANOMES_RETENCAO was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
75
76 %LET _CLIENTTASKLABEL=;
77 %LET _CLIENTPROCESSFLOWNAME=;
78 %LET _CLIENTPROJECTPATH=;
79 %LET _CLIENTPROJECTPATHHOST=;
80 %LET _CLIENTPROJECTNAME=;
81 %LET _SASPROGRAMFILE=;
82 %LET _SASPROGRAMFILEHOST=;
83
3 The SAS System 10:25 Wednesday, August 12, 2020
84 ;*';*";*/;quit;run;
85 ODS _ALL_ CLOSE;
86
87
88 QUIT; RUN;
89
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.