BookmarkSubscribeRSS Feed
Renan_Crepaldi
Obsidian | Level 7

Hello!

 

I'm currently using the SAS Guide 8.1 for an analysis, and I want some help for creating a specific table.

 

My initial dataset is the sales of my business, which contains these main informations:

 

Renan_Crepaldi_1-1596045495969.png

 

I want to make a table that counts how many customers have made a purchase on May/2020 and June/2020. Then on June/2020 and July/2020, and so on...

 

The objective is to see my level of fidelization, creating a retention KPI (how many customers that bought this month were here last month too?)

 

I think that the best way to create this calculation is programming a loop-like code, but I don't have the knowledge for that yet.

 

If anyone needs additional information, please, let me know.

 

Thanks!

7 REPLIES 7
Nicole_Fox
Obsidian | Level 7

I think to get started it would help if you posted a picture of what you want this table to look like in the end after you summarize it.  It would also be good to clarify how you want to count the customers with gaps in their transactions. What if they purchased in May and July but not June.  Does that count as a repeat customer to you?

Renan_Crepaldi
Obsidian | Level 7

Hi, Nicole.

 

Thanks for answering!

 

So, the table would look like this:

 

Renan_Crepaldi_1-1596123624591.png

 

The date columns could be in date format.

 

The distinctActiveCustomer column is the number of customers that made at least 1 purchase on that month. The retainedCustomers are the same, but with the addition that they bought on last month too.

 

About your question, it has to be on two sequential months. If anyone "leaps" a month, they wouldn't be counted.

 

I'm here if you need anymore details.

 

Best regards,

Renan

mkeintz
PROC Star

If the data is sorted by YEARMONTH/ID, then you can read the data in groups.  Each time a unique ID is encountered, add it to a hash table for the currentmonth and increment the number of distinct current id's.   If the ID in hand is also present in the hash table for the preceding month, then add to the retained customer count. 

 

At the last record for a given yearmonth, output the counts, clear the hash table representing the previous month, and reuse it for the upcoming current month (the hash table for the current month then become the object holding the newly prior-month data). 

 

Here is untested code that does that.  I repeat, this assumes that data set HAVE is sorted by YEARMONTH/ID:

 

 

data want (keep=yearmonth distinctactivecustomer retainedcustomers);
  set have;
  by yearmonth id;
  if first.yearmonth then call missing(distinctactivecustomer,retainedcustomers);

  if _n_=1 then do;
    declare hash mntha ();
	  mntha.definekey('id');
	  mntha.definedone();
    declare hash mnthb ();
	  mnthb.definekey('id');
	  mnthb.definedone();
  end;

  retain _curm 'MNTHA' ;  /*Indicator of which hash is for current month*/

  if last.id;  /*Process the id in hand once */
  if _curm='MNTHA' then do;
    mntha.add();
if mnthb.chk()=0 then retainedcustomers+1; end; else do; mnthb.add(); if mntha.chk()=0 then retainedcustomers+1; end; if last.yearmonth; /* Alternate identity of the current month hash object */ if _curm='MNTHA' then _curm='MNTHB'; else if _curm='MNTHB' then _curm='MNTHA'; /* Now clear the "current" month hash, in preparation for next month */ if _curm='A' then mntha.clear(); else if _curm='B' then mnthb.clear(); run;

 

 

If the data are sorted by yearmonth (but not sorted by id within yearmonth), then this minor modified code should do.  The "if last.id;" statement is deleted, and the 2 statements using the hash add methods (mntha.add() and mnthb.add()) are embedded in IF conditions (i.e. don't attempt to add an id to a hash if it is already there).

 


data want (keep=yearmonth distinctactivecustomer retainedcustomers);
  set have;
  by yearmonth id;
  if first.yearmonth then call missing(distinctactivecustomer,retainedcustomers);

  if _n_=1 then do;
    declare hash mntha ();
	  mntha.definekey('id');
	  mntha.definedone();
    declare hash mnthb ();
	  mnthb.definekey('id');
	  mnthb.definedone();
  end;

  retain _curm 'MNTHA' ;  /*Indicator of which hash is for current month*/

  if _curm='MNTHA' then do;
    if mntha.check()^=0 then mntha.add();    /*Modified */
    if mnthb.chk()=0 then retainedcustomers+1;
  end;
  else do;
    if mnthb.check()=0 then mnthb.add();     /*Modified */
    if mntha.chk()=0 then retainedcustomers+1;
  end;

  if last.yearmonth; 
  /* Alternate identity of the current month hash object */
  if _curm='MNTHA' then _curm='MNTHB';  else
  if _curm='MNTHB' then _curm='MNTHA';

  /* Now clear the "current" month hash, in preparation for next month */
  if _curm='A' then mntha.clear(); else
  if _curm='B' then mnthb.clear();
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

If you sort the data by id/yearmonth you don't need hash tables to check an id against the previous month to determine retention.  Instead, you can check the current record against the prior record to see if they are exactly one month apart, demonstrating retention for that id and month: 

 

Editted note:  I've corrected the erroneous spelliing of the macrovar's beg_yyyymm and end_yyyymm.  But this program is still not tested. 

 

%*let beg_yyyyymm=201801;  /* Erroneous initial code, commented out*/
%*let end_yyyyymm=202006;  /* Erroneous initial code, commented out*/

%let beg_yyyymm=201801; /* Corrected replacement code*/
%let end_yyyymm=202006; /* Corrected replacement code*/ %let beg_yyyy=%substr(&beg_yyyymm,1,4); %let end_yyyy=%substr(&end_yyyymm,1,4); %let beg_mm=%substr(&beg_yyyymm,5,2); %let end_mm=%substr(&end_yyyymm,5,2); proc sort data=have out=need; by id yearmonth; run; data want; /* These arrays indexed by year (rows) and months (columns) */ array curcounts {&beg_yyyy:&end_yyyy,1:12} _temporary_ (); array retained {&beg_yyyy:&end_yyyy,1:12} _temporary_ (); set need end=end_of_data; by id yearmonth; if last.yearmonth; /* Let only one obs per id/yearmonth pass through */ /* First, get row and column array indexes for the current record */ _yyyy=floor(yearmonth/100); _mm=mod(yearmonth,100); curcounts{_yyyy,_mm}+1; _curdate=mdy(_mm,1,_yyyy); /* Now see about if this record is preceded by a record from the "retain" yearmonth */ if intck('month',lag(_curdate),_curdate)=1 and lag(id)=id then retained{_yyyy,_mm}+1; if end_of_data; do _yyyy=&beg_yyyy to &end_yyyy ; do _mm=ifn(_yyyy=&beg_yyyy,&beg_mm,1) to ifn(_yyyy=&end_yyyy,&end_mm,12); if curcounts{_yyyy,_mm}^=. then do; yearmonth=100*_yyyy+_mm; distinctactivecustomer =curcounts{_yyyy,_mm}; retainedcustomers=retained{_yyyy,_mm}; output; end; end; end; run;

The program above assumes that you know the range of range is from 201801 through 202006.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Renan_Crepaldi
Obsidian | Level 7

Hello, @mkeintz! Thanks for replying.

 

I've tried your last solution (by sorting the date by id/yearmonth), but it returned some errors.

 

I've attached the log file to make it easier to find out what's happening. I'm sorry to ask for your help again, but I'm very newbie to SAS Programming.

 

Could you help me? If you need any other information, please, let me know.

 

Regards!

mkeintz
PROC Star

I see an error in my code, which I will correct in the original response.

 

This is one of the problems when sample data is not provided - I have nothing to test the program against.  In this case a typographical error got detected by SAS while undetected by me.  It's the major reason that you will constantly see "please provide data in the form of a data step" notes from those who want to help you.

 

Also ..., while I'm in my lecture mode, it would be good in the future if you were to paste the log into the "insert code" popup (activated by the </> icon) as I've done below.  Then every person interested can easily see what we are talking about.

 

Note the log has the warning:

WARNING: Apparent symbolic reference BEG_YYYYMM not resolved.

And prior to that I had coded a macrovariable assignment:

29         %let beg_yyyyymm=201901;

which mistakenly had 5 y's.  It should have been 4 y's, since that's what I used later in the program.  I'll make that correction in my original response. 

 

It a good example to show that the sas log is your friend.  It will be beneficial to understand the reason for every note, warning, and error message posted by sas to the log.

 

 

Log text for erroneous program:

 

1                                                          The SAS System                           10:25 Wednesday, August 12, 2020

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET SYSLAST=WORK.COOPERADO_ANOMES;
4          %LET _CLIENTTASKLABEL='Program';
5          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
6          %LET _CLIENTPROJECTPATH='';
7          %LET _CLIENTPROJECTPATHHOST='';
8          %LET _CLIENTPROJECTNAME='';
9          %LET _SASPROGRAMFILE='';
10         %LET _SASPROGRAMFILEHOST='';
11         
12         ODS _ALL_ CLOSE;
13         OPTIONS DEV=SVG;
14         GOPTIONS XPIXELS=0 YPIXELS=0;
15         %macro HTML5AccessibleGraphSupported;
16             %if %_SAS_VERCOMP(9, 4, 4) >= 0 %then ACCESSIBLE_GRAPH;
17         %mend;
18         FILENAME EGHTML TEMP;
19         ODS HTML5(ID=EGHTML) FILE=EGHTML
20             OPTIONS(BITMAP_MODE='INLINE')
21             %HTML5AccessibleGraphSupported
22             ENCODING='utf-8'
23             STYLE=HTMLBlue
24             NOGTITLE
25             NOGFOOTNOTE
26             GPATH=&sasworklocation
27         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
28         
29         %let beg_yyyyymm=201901;
30         %let end_yyyyymm=202007;
31         
32         %let beg_yyyy=%substr(&beg_yyyymm,1,4);
WARNING: Apparent symbolic reference BEG_YYYYMM not resolved.
WARNING: Apparent symbolic reference BEG not resolved.
33         %let end_yyyy=%substr(&end_yyyymm,1,4);
WARNING: Apparent symbolic reference END_YYYYMM not resolved.
WARNING: Apparent symbolic reference END not resolved.
34         %let beg_mm=%substr(&beg_yyyymm,5,2);
WARNING: Apparent symbolic reference BEG_YYYYMM not resolved.
35         %let end_mm=%substr(&end_yyyymm,5,2);
WARNING: Apparent symbolic reference END_YYYYMM not resolved.
36         
37         proc sort data=WORK.cooperado_anomes out=WORK.COOP_ANOMES_SORT;
38           by NUM_MAT NUM_ANO_MES;
39         run;

NOTE: There were 9039470 observations read from the data set WORK.COOPERADO_ANOMES.
NOTE: The data set WORK.COOP_ANOMES_SORT has 9039470 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           1.51 seconds
      cpu time            3.62 seconds
      

40         
41         data WORK.COOP_RETIDOS_ANOMES;
42           /* These arrays indexed by year (rows) and months (columns) */
WARNING: Apparent symbolic reference BEG not resolved.
2                                                          The SAS System                           10:25 Wednesday, August 12, 2020

43           array curcounts {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
NOTE: Line generated by the macro variable "BEG_YYYY".
43         &beg
           _
           22
           200
            ___
            22
            76
WARNING: Apparent symbolic reference END not resolved.
WARNING: Apparent symbolic reference BEG not resolved.
ERROR 22-322: Syntax error, expecting one of the following: an integer constant, *.  

ERROR 200-322: The symbol is not recognized and will be ignored.

ERROR 76-322: Syntax error, statement will be ignored.

44           array retained  {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
NOTE: Line generated by the macro variable "BEG_YYYY".
44         &beg
           _
           22
           200
            ___
            22
            76
WARNING: Apparent symbolic reference END not resolved.
ERROR 22-322: Syntax error, expecting one of the following: an integer constant, *.  

ERROR 200-322: The symbol is not recognized and will be ignored.

ERROR 76-322: Syntax error, statement will be ignored.

45           set WORK.COOP_ANOMES_SORT end=end_of_data;
46         
47           by NUM_MAT NUM_ANO_MES;
48           if last.NUM_ANO_MES;   /* Let only one obs per id/yearmonth pass through */
49         
50           /* First, get row and column array indexes for the current record */
51           _yyyy=floor(NUM_ANO_MES/100);
52           _mm=mod(NUM_ANO_MES,100);
53         
54           curcounts{_yyyy,_mm}+1;
             _
             400
ERROR: Too many array subscripts specified for array curcounts.
ERROR 400-185: The SUM statement requires numeric expression.

55         
56           _curdate=mdy(_mm,1,_yyyy);
57           /* Now see about if this record is preceded by a record from the "retain" yearmonth */
58           if intck('month',lag(_curdate),_curdate)=1 and lag(NUM_MAT)=NUM_MAT then retained{_yyyy,_mm}+1;
                                                                                      _
                                                                                      400
ERROR: Too many array subscripts specified for array retained.
ERROR 400-185: The SUM statement requires numeric expression.

59         
3                                                          The SAS System                           10:25 Wednesday, August 12, 2020

60           if end_of_data;
61           do _yyyy=&beg_yyyy to &end_yyyy ;
NOTE: Line generated by the macro variable "BEG_YYYY".
61         &beg
           _
           22
WARNING: Apparent symbolic reference BEG not resolved.
NOTE: Line generated by the macro variable "END_YYYY".
61            &end
              _
              22
WARNING: Apparent symbolic reference END not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant, 
              a missing value, INPUT, PUT.  

62             do _mm=ifn(_yyyy=&beg_yyyy,&beg_mm,1)   to ifn(_yyyy=&end_yyyy,&end_mm,12);
NOTE: Line generated by the macro variable "BEG_YYYY".
62         &beg
           _
           22
WARNING: Apparent symbolic reference BEG not resolved.
NOTE: Line generated by the macro variable "END_YYYY".
62         &end
           _
           22
WARNING: Apparent symbolic reference END not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant, 
              a missing value, bitstring, INPUT, PUT.  

63         	  if curcounts{_yyyy,_mm}^=. then do;
ERROR: Too many array subscripts specified for array curcounts.
64         	    NUM_ANO_MES=100*_yyyy+_mm;
65                 distinctactivecustomer =curcounts{_yyyy,_mm};
ERROR: Too many array subscripts specified for array curcounts.
66                 retainedcustomers=retained{_yyyy,_mm};
ERROR: Too many array subscripts specified for array retained.
67         		output;
68         	  end;
69         	end;
70           end;
71         run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      63:7   
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.COOP_RETIDOS_ANOMES may be incomplete.  When this step was stopped there were 0 observations and 10 
         variables.
WARNING: Data set WORK.COOP_RETIDOS_ANOMES was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      

72         
73         %LET _CLIENTTASKLABEL=;
74         %LET _CLIENTPROCESSFLOWNAME=;
75         %LET _CLIENTPROJECTPATH=;
76         %LET _CLIENTPROJECTPATHHOST=;
4                                                          The SAS System                           10:25 Wednesday, August 12, 2020

77         %LET _CLIENTPROJECTNAME=;
78         %LET _SASPROGRAMFILE=;
79         %LET _SASPROGRAMFILEHOST=;
80         
81         ;*';*";*/;quit;run;
82         ODS _ALL_ CLOSE;
83         
84         
85         QUIT; RUN;
86         

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Renan_Crepaldi
Obsidian | Level 7

Hi, @mkeintz. Thanks for the tips! I'm kinda new to the programming world and these kind of advices are really useful.

 

I've made the corrections you pointed out, but it still giving an error. Now it's related to the temporary function according to the log below.

 

I've extracted a random sample from my data (before sorting) and attached it to this reply to make it easier.

 

Thanks again for helping!

 

1                                                          The SAS System                           10:25 Wednesday, August 12, 2020

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET SYSLAST=WORK.COOPERADO_ANOMES;
4          %LET _CLIENTTASKLABEL='Program';
5          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
6          %LET
6        ! _CLIENTPROJECTPATH='\\10.200.12.26\gcl\ANÁLISES\2020\202008_Exploratória_Retenção_Cooperados\Retencao_Cooperados.egp';
7          %LET _CLIENTPROJECTPATHHOST='200SW12GTI140';
8          %LET _CLIENTPROJECTNAME='Retencao_Cooperados.egp';
9          %LET _SASPROGRAMFILE='';
10         %LET _SASPROGRAMFILEHOST='';
11         
12         ODS _ALL_ CLOSE;
13         OPTIONS DEV=SVG;
14         GOPTIONS XPIXELS=0 YPIXELS=0;
15         %macro HTML5AccessibleGraphSupported;
16             %if %_SAS_VERCOMP(9, 4, 4) >= 0 %then ACCESSIBLE_GRAPH;
17         %mend;
18         FILENAME EGHTML TEMP;
19         ODS HTML5(ID=EGHTML) FILE=EGHTML
20             OPTIONS(BITMAP_MODE='INLINE')
21             %HTML5AccessibleGraphSupported
22             ENCODING='utf-8'
23             STYLE=HTMLBlue
24             NOGTITLE
25             NOGFOOTNOTE
26             GPATH=&sasworklocation
27         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
28         
29         %*let beg_yyyyymm=201801;  /* Erroneous initial code, commented out*/
30         %*let end_yyyyymm=202006;  /* Erroneous initial code, commented out*/
31         
32         %let beg_yyyymm=201901;  /* Corrected replacement code*/
33         %let end_yyyymm=202007;  /* Corrected replacement code*/
34         
35         %let beg_yyyy=%substr(&beg_yyyymm,1,4);
36         %let end_yyyy=%substr(&end_yyyymm,1,4);
37         %let beg_mm=%substr(&beg_yyyymm,5,2);
38         %let end_mm=%substr(&end_yyyymm,5,2);
39         
40         proc sort data=WORK.COOPERADO_ANOMES out=WORK.COOP_ANOMES_SORT;
41           by NUM_MAT NUM_ANO_MES;
42         run;

NOTE: There were 9039470 observations read from the data set WORK.COOPERADO_ANOMES.
NOTE: The data set WORK.COOP_ANOMES_SORT has 9039470 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           1.33 seconds
      cpu time            3.36 seconds
      

43         
44         data WORK.COOP_ANOMES_RETENCAO;
45           /* These arrays indexed by year (rows) and months (columns) */
46           array curcounts {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
                                                                     _
                                                                     22
2                                                          The SAS System                           10:25 Wednesday, August 12, 2020

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant, 
              a missing value.  

47           array retained  {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
WARNING: Partial value initialization of the array curcounts.
47           array retained  {&beg_yyyy:&end_yyyy,1:12} _temporary_ ();
                                                                     _
                                                                     22
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant, 
              a missing value.  

48           set WORK.COOP_ANOMES_SORT end=end_of_data;
WARNING: Partial value initialization of the array retained.
49         
50           by NUM_MAT NUM_ANO_MES;
51           if last.NUM_ANO_MES;   /* Let only one obs per id/yearmonth pass through */
52         
53           /* First, get row and column array indexes for the current record */
54           _yyyy=floor(NUM_ANO_MES/100);
55           _mm=mod(NUM_ANO_MES,100);
56         
57           curcounts{_yyyy,_mm}+1;
58         
59           _curdate=mdy(_mm,1,_yyyy);
60           /* Now see about if this record is preceded by a record from the "retain" yearmonth */
61           if intck('month',lag(_curdate),_curdate)=1 and lag(NUM_MAT)=NUM_MAT then retained{_yyyy,_mm}+1;
62         
63           if end_of_data;
64           do _yyyy=&beg_yyyy to &end_yyyy ;
65             do _mm=ifn(_yyyy=&beg_yyyy,&beg_mm,1)   to ifn(_yyyy=&end_yyyy,&end_mm,12);
66         	  if curcounts{_yyyy,_mm}^=. then do;
67         	    NUM_ANO_MES=100*_yyyy+_mm;
68                 distinctactivecustomer =curcounts{_yyyy,_mm};
69                 retainedcustomers=retained{_yyyy,_mm};
70         		output;
71         	  end;
72         	end;
73           end;
74         run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.COOP_ANOMES_RETENCAO may be incomplete.  When this step was stopped there were 0 observations and 7 
         variables.
WARNING: Data set WORK.COOP_ANOMES_RETENCAO was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

75         
76         %LET _CLIENTTASKLABEL=;
77         %LET _CLIENTPROCESSFLOWNAME=;
78         %LET _CLIENTPROJECTPATH=;
79         %LET _CLIENTPROJECTPATHHOST=;
80         %LET _CLIENTPROJECTNAME=;
81         %LET _SASPROGRAMFILE=;
82         %LET _SASPROGRAMFILEHOST=;
83         
3                                                          The SAS System                           10:25 Wednesday, August 12, 2020

84         ;*';*";*/;quit;run;
85         ODS _ALL_ CLOSE;
86         
87         
88         QUIT; RUN;
89         

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1946 views
  • 1 like
  • 3 in conversation