BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abak
Obsidian | Level 7

I have some training data that includes store_num, date, weekly_sale as the important columns for this. Something like

 

STORE_NUM         DATE               WEEKLY_SALES

1                              02/05/2010      13223.75

1                              02/06/2010       1947.05

 

2                              03/12/2010       13623.98

 

etc etc. The goal is to create a new data set with

 

"(a) 'Store_num' (b) 'firstdate', which is the newly created variable containing the first transaction date for each store. (c)'lastdate', which is the newly created variable containing the last transaction date for each store (d)'numdays', which saves the number of days between 'firstdate' and 'lastdate' (e)'avg_sales',"

 

Here is my code ----

 

data retail_sales_new;
set temp;
by Store_num date;


retain one salescum;

if first.Store_num then do;

one = date;
salescum = Weekly_Sales;

end;

 

else salescum = salescum + Weekly_Sales;


if last.Store_num then do;

lastdate = date;
numdays = date - one;
firstdate = one;
avg_sales = salescum/numdays;

end;

 

if last.Store_num;
keep Store_num firstdate lastdate numdays avg_sales;
run;

This returns some slightly bogus results. (I realize that it still needs some formatting but that's a small issue.) Why? It doesn't match the solution given in the homework. Attached is the dataset.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:

First, you did not attach the actual data. You attached a data source. How you bring that data into SAS could well mean that your values would differ from mine if I took a different approach to reading the data.

 

Did you actually look at your data? Looking at the first few records every store has a first date of 2/5/2010 and a last date of 10/26/2012 (at least for stores 1 through 11 which is more than your example output shows). So of course when you subtract the first date from the last date you get the same number of days.

 

I appreciate this may be a training exercise but if my goal were to identify the first and last dates, the days between, the total sales and average sales I might do:

proc summary data=temp nway;
   class store_num;
   var date weekly_sales;
   output out=summary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

But you haven't quite clearly stated which average you want? Do you want to accumulate the sales by the day reported and then average them sales? That is different than what you attempted and one way would be summarize the data twice. Once to get the total for the report day then do the overall summary by store. The statistic you might want for "numdays" (which you do not define just show a code you are not happy with) could well be the count (n) of individual dates appearing in the data.

 

proc summary data=temp nway;
   class store_num date ;
   var weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   sum= weekly_sales
   ;
run;

proc summary data=daysummary nway;
   class store_num  ;
   var date weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=daysinterval n(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

One of the common things to clarify questions is to show the DESIRED result for the output as well.

 

And it wouldn't hurt to format your first date and last date variables.

View solution in original post

7 REPLIES 7
Astounding
PROC Star

I would imagine that NUMDAYS is incorrect and should be:

 

numdays = date - one + 1;

novinosrin
Tourmaline | Level 20

would changing 

 

else salescum = salescum + Weekly_Sales; 

 

to 

 

else  salescum + Weekly_Sales;

 

help ?   /* if you have missing values in your data*/

Quentin
Super User

What do you mean by "slightly bogus results"?  Can you update your question to show 5-10 records of sample data, and also show the results you get from your code for that data, and the desired results?

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
abak
Obsidian | Level 7

I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:

 

originaldata.png

 

Running the code with the data provides these results (ignore the 'one' column, it shouldn't be in the end result but it is the 'one' that I used in the code)

 

results.png

 

 

I greatly appreciate the attempts at help so far, but I don't think they will solve the issue that I am having. Thank you for some very good clarification questions.

ballardw
Super User

I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:

First, you did not attach the actual data. You attached a data source. How you bring that data into SAS could well mean that your values would differ from mine if I took a different approach to reading the data.

 

Did you actually look at your data? Looking at the first few records every store has a first date of 2/5/2010 and a last date of 10/26/2012 (at least for stores 1 through 11 which is more than your example output shows). So of course when you subtract the first date from the last date you get the same number of days.

 

I appreciate this may be a training exercise but if my goal were to identify the first and last dates, the days between, the total sales and average sales I might do:

proc summary data=temp nway;
   class store_num;
   var date weekly_sales;
   output out=summary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

But you haven't quite clearly stated which average you want? Do you want to accumulate the sales by the day reported and then average them sales? That is different than what you attempted and one way would be summarize the data twice. Once to get the total for the report day then do the overall summary by store. The statistic you might want for "numdays" (which you do not define just show a code you are not happy with) could well be the count (n) of individual dates appearing in the data.

 

proc summary data=temp nway;
   class store_num date ;
   var weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   sum= weekly_sales
   ;
run;

proc summary data=daysummary nway;
   class store_num  ;
   var date weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=daysinterval n(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

One of the common things to clarify questions is to show the DESIRED result for the output as well.

 

And it wouldn't hurt to format your first date and last date variables.

abak
Obsidian | Level 7

This anwers my question-the homework and I must have had a disagreement by what we meant about the number of days. 

 

The homework asked that I do it in a data step, but your summary example is still very instructive. Thank you!

ballardw
Super User

@abak wrote:

This anwers my question-the homework and I must have had a disagreement by what we meant about the number of days. 

 

The homework asked that I do it in a data step, but your summary example is still very instructive. Thank you!


Possibly the question about the interpretation of elements of the assignment is good training.

In my job I am often asked to provide a summary of some sort. The first thing I do with the requestor with is clarifying who gets counted because often the have a not-initially-specified filter needed. Such as "women" can turn out to be "women aged 18 to 25 receiving one or more of these services in first 6 months of state fiscal year xxxx".

 

So learning to identify the meaning of something as innocuous as "number of days" some times requires questions like "days with recorded data", "days that should have had recorded data as the first/last day of each week" or "interval between first and last" or something else depending on the project.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1310 views
  • 1 like
  • 5 in conversation