DATA Step, Macro, Functions and more

Why doesn't this code work? Retain exercise

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Why doesn't this code work? Retain exercise

I have some training data that includes store_num, date, weekly_sale as the important columns for this. Something like

 

STORE_NUM         DATE               WEEKLY_SALES

1                              02/05/2010      13223.75

1                              02/06/2010       1947.05

 

2                              03/12/2010       13623.98

 

etc etc. The goal is to create a new data set with

 

"(a) 'Store_num' (b) 'firstdate', which is the newly created variable containing the first transaction date for each store. (c)'lastdate', which is the newly created variable containing the last transaction date for each store (d)'numdays', which saves the number of days between 'firstdate' and 'lastdate' (e)'avg_sales',"

 

Here is my code ----

 

data retail_sales_new;
set temp;
by Store_num date;


retain one salescum;

if first.Store_num then do;

one = date;
salescum = Weekly_Sales;

end;

 

else salescum = salescum + Weekly_Sales;


if last.Store_num then do;

lastdate = date;
numdays = date - one;
firstdate = one;
avg_sales = salescum/numdays;

end;

 

if last.Store_num;
keep Store_num firstdate lastdate numdays avg_sales;
run;

This returns some slightly bogus results. (I realize that it still needs some formatting but that's a small issue.) Why? It doesn't match the solution given in the homework. Attached is the dataset.

 

 


Accepted Solutions
Solution
‎04-11-2018 12:27 PM
Super User
Posts: 13,517

Re: Why doesn't this code work? Retain exercise

I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:

First, you did not attach the actual data. You attached a data source. How you bring that data into SAS could well mean that your values would differ from mine if I took a different approach to reading the data.

 

Did you actually look at your data? Looking at the first few records every store has a first date of 2/5/2010 and a last date of 10/26/2012 (at least for stores 1 through 11 which is more than your example output shows). So of course when you subtract the first date from the last date you get the same number of days.

 

I appreciate this may be a training exercise but if my goal were to identify the first and last dates, the days between, the total sales and average sales I might do:

proc summary data=temp nway;
   class store_num;
   var date weekly_sales;
   output out=summary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

But you haven't quite clearly stated which average you want? Do you want to accumulate the sales by the day reported and then average them sales? That is different than what you attempted and one way would be summarize the data twice. Once to get the total for the report day then do the overall summary by store. The statistic you might want for "numdays" (which you do not define just show a code you are not happy with) could well be the count (n) of individual dates appearing in the data.

 

proc summary data=temp nway;
   class store_num date ;
   var weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   sum= weekly_sales
   ;
run;

proc summary data=daysummary nway;
   class store_num  ;
   var date weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=daysinterval n(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

One of the common things to clarify questions is to show the DESIRED result for the output as well.

 

And it wouldn't hurt to format your first date and last date variables.

View solution in original post


All Replies
Super User
Posts: 6,758

Re: Why doesn't this code work? Retain exercise

I would imagine that NUMDAYS is incorrect and should be:

 

numdays = date - one + 1;

PROC Star
Posts: 1,778

Re: Why doesn't this code work? Retain exercise

[ Edited ]

would changing 

 

else salescum = salescum + Weekly_Sales; 

 

to 

 

else  salescum + Weekly_Sales;

 

help ?   /* if you have missing values in your data*/

PROC Star
Posts: 1,457

Re: Why doesn't this code work? Retain exercise

What do you mean by "slightly bogus results"?  Can you update your question to show 5-10 records of sample data, and also show the results you get from your code for that data, and the desired results?

Contributor
Posts: 24

Re: Why doesn't this code work? Retain exercise

I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:

 

originaldata.png

 

Running the code with the data provides these results (ignore the 'one' column, it shouldn't be in the end result but it is the 'one' that I used in the code)

 

results.png

 

 

I greatly appreciate the attempts at help so far, but I don't think they will solve the issue that I am having. Thank you for some very good clarification questions.

Solution
‎04-11-2018 12:27 PM
Super User
Posts: 13,517

Re: Why doesn't this code work? Retain exercise

I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:

First, you did not attach the actual data. You attached a data source. How you bring that data into SAS could well mean that your values would differ from mine if I took a different approach to reading the data.

 

Did you actually look at your data? Looking at the first few records every store has a first date of 2/5/2010 and a last date of 10/26/2012 (at least for stores 1 through 11 which is more than your example output shows). So of course when you subtract the first date from the last date you get the same number of days.

 

I appreciate this may be a training exercise but if my goal were to identify the first and last dates, the days between, the total sales and average sales I might do:

proc summary data=temp nway;
   class store_num;
   var date weekly_sales;
   output out=summary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

But you haven't quite clearly stated which average you want? Do you want to accumulate the sales by the day reported and then average them sales? That is different than what you attempted and one way would be summarize the data twice. Once to get the total for the report day then do the overall summary by store. The statistic you might want for "numdays" (which you do not define just show a code you are not happy with) could well be the count (n) of individual dates appearing in the data.

 

proc summary data=temp nway;
   class store_num date ;
   var weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   sum= weekly_sales
   ;
run;

proc summary data=daysummary nway;
   class store_num  ;
   var date weekly_sales;
   output out=daysummary (drop=_type_ _freq_)
   min(date)=FirstDate max(date)=lastdate range(date)=daysinterval n(date)=numdays
   sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales
   ;
run;

One of the common things to clarify questions is to show the DESIRED result for the output as well.

 

And it wouldn't hurt to format your first date and last date variables.

Contributor
Posts: 24

Re: Why doesn't this code work? Retain exercise

This anwers my question-the homework and I must have had a disagreement by what we meant about the number of days. 

 

The homework asked that I do it in a data step, but your summary example is still very instructive. Thank you!

Super User
Posts: 13,517

Re: Why doesn't this code work? Retain exercise


@abak wrote:

This anwers my question-the homework and I must have had a disagreement by what we meant about the number of days. 

 

The homework asked that I do it in a data step, but your summary example is still very instructive. Thank you!


Possibly the question about the interpretation of elements of the assignment is good training.

In my job I am often asked to provide a summary of some sort. The first thing I do with the requestor with is clarifying who gets counted because often the have a not-initially-specified filter needed. Such as "women" can turn out to be "women aged 18 to 25 receiving one or more of these services in first 6 months of state fiscal year xxxx".

 

So learning to identify the meaning of something as innocuous as "number of days" some times requires questions like "days with recorded data", "days that should have had recorded data as the first/last day of each week" or "interval between first and last" or something else depending on the project.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 146 views
  • 1 like
  • 5 in conversation