I have some training data that includes store_num, date, weekly_sale as the important columns for this. Something like
STORE_NUM DATE WEEKLY_SALES
1 02/05/2010 13223.75
1 02/06/2010 1947.05
2 03/12/2010 13623.98
etc etc. The goal is to create a new data set with
"(a) 'Store_num' (b) 'firstdate', which is the newly created variable containing the first transaction date for each store. (c)'lastdate', which is the newly created variable containing the last transaction date for each store (d)'numdays', which saves the number of days between 'firstdate' and 'lastdate' (e)'avg_sales',"
Here is my code ----
data retail_sales_new;
set temp;
by Store_num date;
retain one salescum;
if first.Store_num then do;
one = date;
salescum = Weekly_Sales;
end;
else salescum = salescum + Weekly_Sales;
if last.Store_num then do;
lastdate = date;
numdays = date - one;
firstdate = one;
avg_sales = salescum/numdays;
end;
if last.Store_num;
keep Store_num firstdate lastdate numdays avg_sales;
run;
This returns some slightly bogus results. (I realize that it still needs some formatting but that's a small issue.) Why? It doesn't match the solution given in the homework. Attached is the dataset.
I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:
First, you did not attach the actual data. You attached a data source. How you bring that data into SAS could well mean that your values would differ from mine if I took a different approach to reading the data.
Did you actually look at your data? Looking at the first few records every store has a first date of 2/5/2010 and a last date of 10/26/2012 (at least for stores 1 through 11 which is more than your example output shows). So of course when you subtract the first date from the last date you get the same number of days.
I appreciate this may be a training exercise but if my goal were to identify the first and last dates, the days between, the total sales and average sales I might do:
proc summary data=temp nway; class store_num; var date weekly_sales; output out=summary (drop=_type_ _freq_) min(date)=FirstDate max(date)=lastdate range(date)=numdays sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales ; run;
But you haven't quite clearly stated which average you want? Do you want to accumulate the sales by the day reported and then average them sales? That is different than what you attempted and one way would be summarize the data twice. Once to get the total for the report day then do the overall summary by store. The statistic you might want for "numdays" (which you do not define just show a code you are not happy with) could well be the count (n) of individual dates appearing in the data.
proc summary data=temp nway; class store_num date ; var weekly_sales; output out=daysummary (drop=_type_ _freq_) sum= weekly_sales ; run; proc summary data=daysummary nway; class store_num ; var date weekly_sales; output out=daysummary (drop=_type_ _freq_) min(date)=FirstDate max(date)=lastdate range(date)=daysinterval n(date)=numdays sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales ; run;
One of the common things to clarify questions is to show the DESIRED result for the output as well.
And it wouldn't hurt to format your first date and last date variables.
I would imagine that NUMDAYS is incorrect and should be:
numdays = date - one + 1;
would changing
else salescum = salescum + Weekly_Sales;
to
else salescum + Weekly_Sales;
help ? /* if you have missing values in your data*/
What do you mean by "slightly bogus results"? Can you update your question to show 5-10 records of sample data, and also show the results you get from your code for that data, and the desired results?
I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:
Running the code with the data provides these results (ignore the 'one' column, it shouldn't be in the end result but it is the 'one' that I used in the code)
I greatly appreciate the attempts at help so far, but I don't think they will solve the issue that I am having. Thank you for some very good clarification questions.
I attached the actual data to the original post in csv format. The results do not error out, but I get the wrong number of days. They are ALL 994 with the same first date and last date. Here is the first some odd results of the original data:
First, you did not attach the actual data. You attached a data source. How you bring that data into SAS could well mean that your values would differ from mine if I took a different approach to reading the data.
Did you actually look at your data? Looking at the first few records every store has a first date of 2/5/2010 and a last date of 10/26/2012 (at least for stores 1 through 11 which is more than your example output shows). So of course when you subtract the first date from the last date you get the same number of days.
I appreciate this may be a training exercise but if my goal were to identify the first and last dates, the days between, the total sales and average sales I might do:
proc summary data=temp nway; class store_num; var date weekly_sales; output out=summary (drop=_type_ _freq_) min(date)=FirstDate max(date)=lastdate range(date)=numdays sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales ; run;
But you haven't quite clearly stated which average you want? Do you want to accumulate the sales by the day reported and then average them sales? That is different than what you attempted and one way would be summarize the data twice. Once to get the total for the report day then do the overall summary by store. The statistic you might want for "numdays" (which you do not define just show a code you are not happy with) could well be the count (n) of individual dates appearing in the data.
proc summary data=temp nway; class store_num date ; var weekly_sales; output out=daysummary (drop=_type_ _freq_) sum= weekly_sales ; run; proc summary data=daysummary nway; class store_num ; var date weekly_sales; output out=daysummary (drop=_type_ _freq_) min(date)=FirstDate max(date)=lastdate range(date)=daysinterval n(date)=numdays sum(weekly_sales)=Salescum mean(weekly_sales)=avg_sales ; run;
One of the common things to clarify questions is to show the DESIRED result for the output as well.
And it wouldn't hurt to format your first date and last date variables.
This anwers my question-the homework and I must have had a disagreement by what we meant about the number of days.
The homework asked that I do it in a data step, but your summary example is still very instructive. Thank you!
@abak wrote:
This anwers my question-the homework and I must have had a disagreement by what we meant about the number of days.
The homework asked that I do it in a data step, but your summary example is still very instructive. Thank you!
Possibly the question about the interpretation of elements of the assignment is good training.
In my job I am often asked to provide a summary of some sort. The first thing I do with the requestor with is clarifying who gets counted because often the have a not-initially-specified filter needed. Such as "women" can turn out to be "women aged 18 to 25 receiving one or more of these services in first 6 months of state fiscal year xxxx".
So learning to identify the meaning of something as innocuous as "number of days" some times requires questions like "days with recorded data", "days that should have had recorded data as the first/last day of each week" or "interval between first and last" or something else depending on the project.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.