Greetings! Below are the data sets to compute.
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
The output should be like:
Name No. of visits Total Spent
Mary Chan 4 216550
John Tam 1 35000
Any idea? Thanks guys!!
@Mike_Chan wrote:
Greetings! Below are the data sets to compute.
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
Have you already created a SAS data set from the above raw data? If so, please share the code. If not, then is the question really how to read this data into SAS?
It seems like the dates are irrelevant to the creation of the desired output data set, is that correct?
Thanks for ur reply! This was the simplified output above. The full output should be like this:
(i think i can handle the case regarding to dates so i didnt include it on my question!) Thanks so much.
This does not seem to answer my other questions.
The dataset is relevant to the output.
The logic is firstly record the name of the customer, then count how many sales were made by the customer (can be by the number of date or number of sales made). Finally, compute the days between last sales and 31 Dec 2023.
Below are the code im trying but seems not work:
IF you have the NAME on every observation then a suggested update to @Tom's solution. If you data does not have the name on each observation that is the first step before summarizing.
Proc summary data=have nway; class name; var spent date; output out=need(drop=_:) sum(spent)=spentsum n(spent)=NumberSales max(date)=latestdate ; run;
Data want;
set need;
daysbetween = '31Dec2023'd - latestdate;
run;
I wouldn't call such a total a "running sum" as that would generally be interpreted as incremented per observation in a data set with more than one observation per name or similar grouping variable.
Really should make sure that your resultant date variables have a suitable display format assigned.
sorry for the misleading. My aim is to develop a data step to generate the output (provided in the question). I have worked part of it (Can see in the comment section) but I was totally struggled.
Non-tested code since you disn't provide raw data as a data step/datalines:
proc sql;
select name,
max(sales_date) as last_sales_date,
count(*) as no_of_sales,
sum(sales) as run_sales_tot,
'31Dec2023'd - calculated last_sales_date as no_of_days_last_to_31dec2023
from have
group by name
;
quit;
@Mike_Chan wrote:
sorry for the misleading. My aim is to develop a data step to generate the output (provided in the question). I have worked part of it (Can see in the comment section) but I was totally struggled.
So you actually want to rewrite the procedures that SAS has spent years developing so they are fast and relatively easy to use? Why?
And you changed the requirement from the initial question to the second output. What else may creep in that isn't really amenable to data step processing?
Provide example data in the form of a working data step and we can show how to write a data step but really, other procedures are the way to go. What works for one data set may require a great deal of additional logic/coding when you add one variable or condition. So all the requirements and conditions have be stated up front as well.
Assuming you have a dataset named HAVE with variables named NAME and SPENT then you should just use PROC SUMMARY to calculate your desired results.
proc summary data=have nway ;
class name;
var spent ;
output out=want n=COUNT sum=Total_Spent ;
run;
@Mike_Chan wrote:
Greetings! Below are the data sets to compute.
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
So it seems you have a dataset with three character variables; in the first row of each group, you have the first variable populated with a name, but in the following rows everything shifts to the left, so the third variable is empty.
If that is not what you have, please clear up any ambiguity by posting your "have" dataset in a working DATA step with DATALINES that recreates your dataset.
Thx for ur reply. It is an assignment from my school so I guess this is the challenging part.
In which form did you get that assignment? Particularly, how is the source data contained there? File attachment, text, picture, something else? Or do you have a dataset already stored on a server?
The more (clear) information you provide about your task, the faster you will get answers leading to a solution. By forcing us to make guesses, you waste time, yours and ours.
This is the question.
data Totalsales;
input name $quote20. @;
s1 = substr(name,1,1);
If ('A' <= s1 <= 'Z') then do;
datalines;
"David Wong" 1/Aug/2023 13200
"Brian Leung" 15/Sep/2023 23450
10/Sep/2023 33000
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
;
run;
I am going on this direction but im stuck.
@Mike_Chan I'm not sure what the creator of this exercise has been thinking. No SAS programmer with some experience would do this in a single data step. It just complicates and convolutes the process.
I'm normally not providing full code for an exercise but making an exception for this one given all the unnecessary complications. Please spend the time to understand what the code does (if you copy/paste the code into an AI like Copilot then you'll get already a lot of explanation - rest is in the SAS docu).
/* create text file with sample data */
filename src_data temp;
data _null_;
file src_data;
infile datalines;
input;
put _infile_;
datalines;
"David Wong" 1/Aug/2023 13200
"Brian Leung" 15/Sep/2023 23450
10/Sep/2023 33000
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
;
run;
/* single data step approach as demanded */
data demo;
attrib
name format=$20. label='Name'
last_sales_date format=date9. label='Last sales date'
no_of_sales format=best32. label='Number of sales'
sales_total format=best32. label='Running sales total'
days_diff format=best32. label='Days between last sales and 31 Dec 2023'
;
keep
name
last_sales_date
no_of_sales
sales_total
days_diff
;
array a_dts{20} 8 _temporary_;
array a_sal{20} 8 _temporary_;
retain name sales_total;
infile src_data dsd dlm=' ' truncover end=eof;
input _x $1. @;
if missing(_x) then
do;
no_of_sales+1;
_infile_=left(_infile_);
input @1 a_dts[no_of_sales]:date9. a_sal[no_of_sales]:best32.;
end;
else
do;
if _n_>1 then
do;
last_sales_date=max(of a_dts[*]);
days_diff='31dec2023'd-last_sales_date;
sales_total=sum(sales_total, of a_sal[*]);
output;
call missing(of a_dts[*], of a_sal[*]);
end;
no_of_sales=1;
input @1 name:$20. a_dts[no_of_sales]:date9. a_sal[no_of_sales]:best32.;
end;
if eof then
do;
last_sales_date=max(of a_dts[*]);
days_diff='31dec2023'd-last_sales_date;
sales_total=sum(sales_total, of a_sal[*]);
output;
end;
run;
proc print data=demo label;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.