BookmarkSubscribeRSS Feed
SM3
Calcite | Level 5 SM3
Calcite | Level 5
Hi.I have data as below. I want to calculate sum & count of transactions across days based on condition "transaction amount between 0 to 4000" by cust and act.I have data from sep1st. 2011 to sep1st 2012

Cust id actid trasdate transamt
100 1 9/30/2011 1500
100 1 11/4/2011 2000
100 1 11/4/2011 5000
100 1 11/4/2011 1000
100 1 11/5/2011 4000
100 1 11/5/2011 2500

I want result as

custd actid transdate sumtransamt cnt
100 1 9/30/2011 1500 1
100 1 11/4/2011 3000 3
100 1 11/5/2011 2500 1

Appreciate ur help.

Thanks
11 REPLIES 11
LaurieF
Barite | Level 11

This is pretty close to what you want, I think:

data trancount;
infile cards;
attrib custid actid length=4
attrib transdate length=4 informat=mmddyy10. format=yymmdd10.;
attrib transamt length=6 format=comma14.;
input custid
      actid
      transdate
      transamt;
cards;
100 1 9/30/2011 1500
100 1 11/4/2011 2000
100 1 11/4/2011 5000
100 1 11/4/2011 1000
100 1 11/5/2011 4000
100 1 11/5/2011 2500
;
run;

proc sql;
create table result as
   select custid,
          actid,
          transdate,
          sum(transamt) as sumtransamt format=comma14.,
          count(*) as cnt length=4
     from trancount
    where 0 le transamt lt 4000 /* It appears you don't actually want 4,000 inclusive */
    group by custid,
             actid,
             transdate;
quit;

There's so many ways to skin a cat with SAS; the second step could easily be replaced with a sort and a data step. But as long as it remains as simple as this, it should scale well.

SM3
Calcite | Level 5 SM3
Calcite | Level 5
Thanks Laurie.But I want to do this by using arrays. Can u help?
SM3
Calcite | Level 5 SM3
Calcite | Level 5
Array because the sum of transactions and count should be done by taking 5 days recurring window i,e sep1-5 then sept2-6 like that it goes on.
Reeza
Super User

You didn't mention moving statistics in your initial post. Given your question, @LaurieF solution is correct. 

 

Here's a SAS note regarding using an array to calculating moving statistics. 

Note that you cannot be missing dates in your series for an array method to work correctly, ie you need a record for every day. 

 

http://support.sas.com/kb/41/380.html

SM3
Calcite | Level 5 SM3
Calcite | Level 5
Yes Reeza, I agree LaurieF's solution is corrrct if we don't have moving statistics thing. Thanks
mkeintz
PROC Star

Oops, ignore everything except my first paragraph.  You apparently want discrete (not rolling) 5-day windows, starting with 9/1/2011.

 

Your WANT dataset has 3 transactions for 11/4, but only 1 for 11/5.  Is that consistent?  The resulting CNT variable represents only qualifying transactions (<4000) for 9/30 and 11/5, but counts ALL transactions for 11/4.

 

You asked for an array solution, which I take means you want a data step approach.  However, while the array solution might be good for rolling windows of fixed size, or for scanning over several amounts within each transaction, I don't think it makes sense here.  Here's a data step program the generates the same SUMTRANSAMT value as you have, and generates a corresponding count of eligible transaction

 

data want;

  set trancount;

  by custid actid transdate ;

  if first.transdate then do;

    cnt=0; sumtransamt=.;

  end;

  if 0<=transamt<4000 then do;

    cnt+1;

    sumtransamt+transamt;

  end;

  if last.transdate;

  drop transamt;

run;

 

I'm not saying it's impossible to use arrays, but at the least, you'd have to know in advance the maximum number of transactions that might occur on a given date, so that you'd declare an array that is big enough to hold every transaction for the day. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Here's a solution that applies to 5-day date periods, using 01sep2011 as the reference date, but using arrays still seems sub-optimal.  It uses the same basic logic, and assumes your data are sorted by custid actid transdate:

 

data need/view=need;

  set trancount;

   date_group=floor((transdate-'01sep2011'd)/5);

run;

data want;

  set need;

  by custid actid date_group ;

  if first.date_group then do;

     cnt=0; sumtransamt=.;

  end;

  if 0<=transamt<4000 then do;

    cnt+1;

    sumtransamt+transamt;

  end;

  if last.date_group;

  drop date_group transamt;

run;

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

data trancount;
infile cards;
attrib custid actid length=4;
attrib transdate length=4 informat=mmddyy10. format=yymmdd10.;
attrib transamt length=6 format=comma14.;
input custid
      actid
      transdate
      transamt;
cards;
100 1 9/30/2011 1500
100 1 11/4/2011 2000
100 1 11/4/2011 5000
100 1 11/4/2011 1000
100 1 11/5/2011 4000
100 1 11/5/2011 2500
;
run;
data want;
n=0; sumtransamt =0;
 do until(last.transdate);
  set trancount;
  by custid actid transdate;
  if 0<4000 then do;
   n+1; sumtransamt+transamt;
  end;
 end;
 drop transamt;
run;

SM3
Calcite | Level 5 SM3
Calcite | Level 5

Guys I really appreciate all of ur help & i ll use ur suggestions and apply on my data. Want to clarify one thing regarding the data. The data i gave is just for one cust id and i have several such custid in my data with transaction date from 1sep 2011-30sep2012. Also there are instances where customers don't have transactions everyday. If there are multiple transactions in a  day i need to sum it up.First i need to identify transaction between 0-4k by using a rolling 5 day window ex: 1st spt to 5thsep, then 2nd sept to 6sep and so on and then i need to sum the transactions by cust id & acct id. Its little complex to assemble and thanks all for helping me out.

Reeza
Super User

@SM3 wrote:

Guys I really appreciate all of ur help & i ll use ur suggestions and apply on my data. Want to clarify one thing regarding the data. The data i gave is just for one cust id and i have several such custid in my data with transaction date from 1sep 2011-30sep2012. Also there are instances where customers don't have transactions everyday. If there are multiple transactions in a  day i need to sum it up.First i need to identify transaction between 0-4k by using a rolling 5 day window ex: 1st spt to 5thsep, then 2nd sept to 6sep and so on and then i need to sum the transactions by cust id & acct id. Its little complex to assemble and thanks all for helping me out.


 

This shouldn't be difficult and is within the first SAS programming course, break it down into smaller steps and you'll be fine. BY group processing will handle multiple ID and accounts without any difficulty. Please note, that I'm not trying to trivialize what you're trying to do, only trying to say there isn't anything overly complex with what you've specified. 

 

Your steps appear out of order, but perhaps that's from what you're posting here. You're going to need to summarize per day BEFORE you calculate a rolling 5 day window, otherwise your filter won't be what you're expecting. You also need to define your 5 day definition carefully. Is it always 5 days, or 5 business days or 5 transaction days? September is fine, but imagine any 5 day period that includes Dec 24th and 26th.

 

Make sure that the data you provide shows the complexities of your data, but not overcomplicating it. Including a few ID/accounts is a good start. 

 

Patrick
Opal | Level 21

@SM3

I know it's sometimes harder than if looks like from outside and I get it wrong more often than not when asking a question BUT: It's really worth if you take the time and provide fully representative sample data (a data step creating such data) and then show us how the expected result should look like. This avoids a lot of missunderstandings plus people giving their time to you don't waste their effort for the wrong problem.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2279 views
  • 1 like
  • 6 in conversation