BookmarkSubscribeRSS Feed
Maria8
Calcite | Level 5

I have an Excel sheet of 750,000 rows that span 8 months. I have to display all of the attributes, so Select * would be used. This is SAS Enterprise 9.4. 

 

This is an example of the attributes I'm interested in for the code: 

 

Attribute: Subscriber_Number |   Date_Email_Sent 

                      19202                     03Dec2020

                      12829                     03Dec2020 

                      13455                    05Dec2020

                      13455                    05Dec2020

                       13455                   02 Dec2020

                      13455                   04 Dec2020

An email cannot be sent to a subscriber more than 3 times a week. If that happens, that would become an issue. My goal is to find subscribers who have been called more than 3 times a week. In the example data I gave, the code output should show that Subscriber 13455 was called 4 times in a week. The code output should NOT include 12829 because it just occurs one time. 

 

These are 2 different example codes that I tried and the problems I found with both: 

 

data counts; 

      set have; 

       by Subscriber_Number Date_Email_Sent;

       counter = dif(Date_Email_Sent) <= 7; 

run; 

 

proc summary data=counts; var counter; class Subscriber_Number, output out=sum sum=; 

 

The summary just shows the count all the times a Subscriber was reached out as long as it was less than 7 times, which is not what I'm looking for. 

 

Another Code that I saw that looked promising is this: 

PROC SQL; Select week (Date_Email_Sent, 'u') as weeknum, COUNT(Subscriber_Number) as per_week FROM Data; Where SubscriberNumber >2 Group By week(Date_Email_Sent, 'U') quit;

 

I'm pretty new to SAS, so I'd love help! 

 

10 REPLIES 10
Kurt_Bremser
Super User

The first question: do you count per calendar week, or would you like a running count of mails sent in the last 7 days?

For calendar weeks, this will do it:

data have;
input Subscriber_Number $ Date_Email_Sent :date9.;
format Date_Email_Sent yymmdd10.;
datalines;
19202 03Dec2020
12829 03Dec2020 
13455 05Dec2020
13455 05Dec2020
13455 02Dec2020
13455 04Dec2020
;

proc sql;
create table want as
  select
    *,
    count(*) as mails_per_week
  from have
  group by Subscriber_Number, week(Date_Email_Sent)
  having mails_per_week > 3
;
quit;

Note how presenting example data in a data step with datalines clears any ambiguities regarding variable attributes or contents, and makes it easy for everyone else to recreate the dataset as is. Please do so yourself in the future, it speeds up the answering process.

Maria8
Calcite | Level 5
Is there a way to automatically set the calendar dates instead of having to type in all of the subscriber_numbers and the associated dates for datalines?

I ran the second code and didn't use dateline:

proc sql;
create table want as
select
*,
count(*) as mails_per_week
from have
group by Subscriber_Number, week(Date_Email_Sent)
having mails_per_week > 3
;
quit;

without using datelines, and it seems to be calculating the number of emails correctly, but it repeats the same values each time like this:

Subscriber_Number | Date_Email_Sent | mails_per_week

13455 05Dec2020 4

13455 05Dec2020 4

13455 02 Dec2020 4

13455 04 Dec2020 4
Is there a way to only have the subscriber number show up once so I can count the number of subscribers who are getting the emails more than 3x a week? Where would I put COUNT(Distinct) in the Select Statement?
Reeza
Super User
The datalines part is just to create example data. Reference your actual input data set instead of reading in the data. So you do everything after the data step and replace HAVE with your data set name.
Maria8
Calcite | Level 5
Thanks! I got an error saying that it needed a dataline or infile statement, so I didn't use that and only used the proc sql code - is there a way to only show distinct subscriber numbers?
Reeza
Super User
Which solution are you using? Post the exact code you're trying to run and the error please.
Maria8
Calcite | Level 5
data have;
input subscriber_number $ Date_Email_Sent :date9.;
format Date_Email_Sent yymmdd10.;
run;

Error: No DATALINES or INFILE statement.

When I ran the code as just:
proc sql;
create table want as
select
*,
count(*) as mails_per_week
from have
group by Subscriber_Number, week(Date_Email_Sent)
having mails_per_week > 3
;
quit;

I got an output that seems to be counted correctly. The only issue with that is that I want the distinct number of subscriber_numbers, and instead it just repeats.
Reeza
Super User
proc sql;
create table want as
select
*,
count(*) as mails_per_week
from have
group by Subscriber_Number, week(Date_Email_Sent)
having mails_per_week > 3
;
quit;

Change HAVE to be the name of the data set that already exists. 

ballardw
Super User

There are SAS functions that will return week information from a given date and formats that will group data into week intervals when used with analysis procedures.

You have found one of the week functions but Proc SQL may not group things as you would desire as Week returns a number within a year. So if a customer has records in multiple years you would counting all the visits across the years for that week number.

 

Here is one approach to getting a count using proc summary:

data have;
 input Subscriber_Number $ Date_Email_Sent date9.;
 format date_email_sent date9.;
datalines;
19202 03Dec2020
12829 03Dec2020 
13455 05Dec2020
13455 05Dec2020
13455 02Dec2020
13455 04Dec2020
;

proc summary data=have nway; 
   class Subscriber_Number Date_Email_Sent; 
   format date_email_sent weeku6.;
   output out=want;
run;

When you only have class variables the summary will only have _type_ and _freq_ added to the class variables with _freq_ the count of the particular combination. The NWAY option gives you only the combinations that involve all of the class variables.

The dates in this case, since they are so similar all appear as 20W48. 20 is the last two digits of the year and the W tells you the result is a week in the 48th week of the year. Since we used the Weeku format the week is determined using the U algorithm. These formats if using a width larger than 6 start adding more information such day of week which you wouldn't want for this purpose.

 

If you want a "prettier" result you could make your own custom format. Example:

proc format;
picture myweek (default=9)
low-high = '%Y W %0U' (datatype=date)
;
run;
proc summary data=have nway; 
   class Subscriber_Number Date_Email_Sent; 
   format date_email_sent myweek.;
   output out=want;
run;

The Picture statement directives, the bits preceded by % allow you to display differing date, time or datetime elements. the %0U says to use the U algorithm for calculating week and the zero that precedes the U displays a zero if the week number is a single digit. Other characters, such as the spaces and the W are used literally. So you could replace W with Week to get values of 2020 Week 48 if desired. The Default indicates how many spaces to display by default. Like other formats you can use a width value, such as Myweek12. for different displayed values.

Caution: If you use double quotes around the target string with directives you will nasty messages about macro variables that aren't defined, so this is one place you really want to use single quotes.

 

Formats used this way will work in the report procedures like Report and Tabulate to group values if want prettier output than Proc Summary.

proc tabulate data=have;
   class Subscriber_Number Date_Email_Sent ;
   format date_email_sent myweek.;

   table Subscriber_Number* Date_Email_Sent ,
         n
   ;
run;

 

Kurt_Bremser
Super User

A running count can be achieved by using a temporary array:

%let begin = %sysfunc(inputn(19000101,yymmdd8.));
%let end = %sysfunc(inputn(21001231,yymmdd8.));

proc sort data=have;
by subscriber_number date_email_sent;
run;

data running;
array dates {&begin.:&end.} _temporary_;
set have;
by subscriber_number date_email_sent;
if first.subscriber_number then do i = &begin. to &end.;
  dates{i} = .;
end;
count = 0;
dates{date_email_sent} + 1;
do i = date_email_sent - 7 to date_email_sent;
  count + dates{i};
end;
if count > 3;
drop i;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1493 views
  • 0 likes
  • 4 in conversation