BookmarkSubscribeRSS Feed
Mike_Chan
Calcite | Level 5

Greetings! Below are the data sets to compute.

"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000

 

The output should be like:

Name                 No. of visits               Total Spent

Mary Chan                  4                               216550

John Tam                     1                                 35000

 

Any idea? Thanks guys!!

17 REPLIES 17
PaigeMiller
Diamond | Level 26

@Mike_Chan wrote:

Greetings! Below are the data sets to compute.

"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000


Have you already created a SAS data set from the above raw data? If so, please share the code. If not, then is the question really how to read this data into SAS?

 

It seems like the dates are irrelevant to the creation of the desired output data set, is that correct?

--
Paige Miller
Mike_Chan
Calcite | Level 5

Thanks for ur reply! This was the simplified output above. The full output should be like this:

 

Mike_Chan_0-1730122005051.png

Mike_Chan_1-1730122070268.png

(i think i can handle the case regarding to dates so i didnt include it  on my question!) Thanks so much.

 

PaigeMiller
Diamond | Level 26

This does not seem to answer my other questions.

--
Paige Miller
Mike_Chan
Calcite | Level 5

The dataset is relevant to the output.

 

The logic is firstly record the name of the customer, then count how many sales were made by the customer (can be by the number of date or number of sales made). Finally, compute the days between last sales and 31 Dec 2023.

 

Below are the code im trying but seems not work:

data Totalsales;
input name $quote20. @;
format var1 date11.;
 
s1 = substr(name,1,1);
 
If ('A' <= s1 <= 'Z') then do;
if _n_ ^=1 then output;
input var1  sales;
end;
 
if endfile=1 then output;
 
 
datalines;
"David Wong" 1/Aug/2023 13200
"Brian Leung" 15/Sep/2023 23450
10/Sep/2023 33000
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
;
run;
ballardw
Super User

IF you have the NAME on every observation then a suggested update to @Tom's solution. If you data does not have the name on each observation that is the first step before summarizing.

 

Proc summary data=have nway;
   class name;
   var spent date;
   output out=need(drop=_:) sum(spent)=spentsum n(spent)=NumberSales max(date)=latestdate
;
run;
Data want;
set need;
daysbetween = '31Dec2023'd - latestdate;
run;

I wouldn't call such a total a "running sum" as that would generally be interpreted as incremented per observation in a data set with more than one observation per name or similar grouping variable.

 

Really should make sure that your resultant date variables have a suitable display format assigned.

Mike_Chan
Calcite | Level 5

sorry for the misleading. My aim is to develop a data step to generate the output (provided in the question). I have worked part of it (Can see in the comment section) but I was totally struggled.

LinusH
Tourmaline | Level 20

Non-tested code since you disn't provide raw data as a data step/datalines:

proc sql;
  select name, 
	     max(sales_date) as last_sales_date,
	     count(*) as no_of_sales,
	     sum(sales) as run_sales_tot,
	     '31Dec2023'd - calculated last_sales_date as no_of_days_last_to_31dec2023
	from have
	group by name
	;
quit;
Data never sleeps
ballardw
Super User

@Mike_Chan wrote:

sorry for the misleading. My aim is to develop a data step to generate the output (provided in the question). I have worked part of it (Can see in the comment section) but I was totally struggled.


So you actually want to rewrite the procedures that SAS has spent years developing so they are fast and relatively easy to use? Why?

 

And you changed the requirement from the initial question to the second output. What else may creep in that isn't really amenable to data step processing?

 

Provide example data in the form of a working data step and we can show how to write a data step but really, other procedures are the way to go. What works for one data set may require a great deal of additional logic/coding when you add one variable or condition. So all the requirements and conditions have be stated up front as well.

 

 

Tom
Super User Tom
Super User

Assuming you have a dataset named HAVE with variables named NAME and SPENT then you should just use PROC SUMMARY to calculate your desired results.

proc summary data=have nway ;
   class name;
   var spent ;
   output out=want n=COUNT sum=Total_Spent ;
run;

 

Kurt_Bremser
Super User

@Mike_Chan wrote:

Greetings! Below are the data sets to compute.

"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
1/Dec/2023 21500
30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000

So it seems you have a dataset with three character variables; in the first row of each group, you have the first variable populated with a name, but in the following rows everything shifts to the left, so the third variable is empty.

If that is not what you have, please clear up any ambiguity by posting your "have" dataset in a working DATA step with DATALINES that recreates your dataset.

Mike_Chan
Calcite | Level 5

Thx for ur reply. It is an assignment from my school so I guess this is the challenging part.

Kurt_Bremser
Super User

In which form did you get that assignment? Particularly, how is the source data contained there? File attachment, text, picture, something else? Or do you have a dataset already stored on a server?

The more (clear) information you provide about your task, the faster you will get answers leading to a solution. By forcing us to make guesses, you waste time, yours and ours.

 

Mike_Chan
Calcite | Level 5

Mike_Chan_0-1730192752332.png

Mike_Chan_1-1730192761593.png

This is the question. 

 

data Totalsales;
input name $quote20. @;

s1 = substr(name,1,1);

If ('A' <= s1 <= 'Z') then do;

datalines;
"David Wong" 1/Aug/2023 13200
"Brian Leung" 15/Sep/2023 23450
  10/Sep/2023 33000
"Mary Chan" 3/Jul/2023 45600
20/Aug/2023 37800
    1/Dec/2023 21500
 30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
;
run;

I am going on this direction but im stuck.

Patrick
Opal | Level 21

@Mike_Chan I'm not sure what the creator of this exercise has been thinking. No SAS programmer with some experience would do this in a single data step. It just complicates and convolutes the process.

I'm normally not providing full code for an exercise but making an exception for this one given all the unnecessary complications. Please spend the time to understand what the code does (if you copy/paste the code into an AI like Copilot then you'll get already a lot of explanation - rest is in the SAS docu).

/* create text file with sample data */
filename src_data temp;
data _null_;
  file src_data;
  infile datalines;
  input;
  put _infile_;
  datalines;
"David Wong" 1/Aug/2023 13200
"Brian Leung" 15/Sep/2023 23450
  10/Sep/2023 33000
"Mary Chan" 3/Jul/2023 45600
 20/Aug/2023 37800
     1/Dec/2023 21500
   30/Aug/2023 42000
"John Tam" 12/Sep/2023 35000
;
run;

/* single data step approach as demanded */
data demo;
  attrib
    name            format=$20.    label='Name'
    last_sales_date format=date9.  label='Last sales date'
    no_of_sales     format=best32. label='Number of sales'
    sales_total     format=best32. label='Running sales total'
    days_diff       format=best32. label='Days between last sales and 31 Dec 2023'
    ;
  keep 
    name
    last_sales_date
    no_of_sales    
    sales_total    
    days_diff      
    ;

  array a_dts{20} 8 _temporary_;
  array a_sal{20} 8 _temporary_;
  retain name sales_total;

  infile src_data dsd dlm=' ' truncover end=eof;
  input _x $1. @;

  if missing(_x) then 
    do;
      no_of_sales+1;
      _infile_=left(_infile_);
      input @1 a_dts[no_of_sales]:date9. a_sal[no_of_sales]:best32.;
    end;

  else
    do;
      if _n_>1 then
        do;
          last_sales_date=max(of a_dts[*]);
          days_diff='31dec2023'd-last_sales_date;
          sales_total=sum(sales_total, of a_sal[*]);
          output;
          call missing(of a_dts[*], of a_sal[*]); 
        end;

      no_of_sales=1;
      input @1 name:$20. a_dts[no_of_sales]:date9. a_sal[no_of_sales]:best32.;
    end;

  if eof then
    do;
      last_sales_date=max(of a_dts[*]);
      days_diff='31dec2023'd-last_sales_date;
      sales_total=sum(sales_total, of a_sal[*]);
      output;
    end;
run;

proc print data=demo label;
run;

Patrick_0-1730199079031.png

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 2731 views
  • 0 likes
  • 8 in conversation