BookmarkSubscribeRSS Feed
Raghs_Newbee
Calcite | Level 5

Hi Team,

I have 3 million obs with account numbers of different persons with N number of transactions at different dates for the last three years.

I need to collect the Distinct(Account_Number) and the Amount which is actually the last transaction made by the customer.

Thanks

11 REPLIES 11
art297
Opal | Level 21

If there isn't a chance that customers had multiple transactions on the last date, the task would be easy with proc sql, simply using distinct and where date=max(date).

However, since customers probably could have multiple transactions, and if the data are already in date order, then I would recommend sorting the file by account_Number and then using last.account_Number in a datastep.

Raghs_Newbee
Calcite | Level 5

Thanks Art, But unfortunately yes.. few customer have more transactions in the last date.. So i am unable to do so using Proc SQL. The report I am pulling also consits of a lot of other variables that needs either to be summed up or transposed to find the count. I will post some Sample data and the output I need for a Better understanding.

Haikuo
Onyx | Level 15

Do your date value have a time portion to identify different transactions on the same date or do your transaction number serve the same purpose?

Raghs_Newbee
Calcite | Level 5

Yes, The Date Column is in DateTime format. Attached is a Sample Data. I am not worried about the blanks in the Date Column, Just need the Last Date Time Transaction. I did try the Querry of HIMA. I get repeated Account ID when the transaction Amount is Different.

Hima
Obsidian | Level 7

data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 12.00 02/16/2012
12345678 14.00 02/15/2012
23456789 15.00 02/14/2012
23456789 13.00 02/13/2012
;
run;

proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;

Output:

                                account_
                                number        tran_am   tran_dt
                                ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
                                12345678      12.00     02/16/20
                                23456789      15.00     02/14/20


Hima
Obsidian | Level 7

The above query works good even if there are multiple transactions with in the max date.

art297
Opal | Level 21

Hima: your suggested code will produce multiple records when there are multiple transactions on the last day and they are for different amounts.  e.g.:

data a;

input account_number $ tran_am $ tran_dt $;

cards;

12345678 14.00 02/15/2012

12345678 12.00 02/16/2012

12345678  9.00 02/16/2012

23456789 13.00 02/13/2012

23456789 15.00 02/14/2012

23456789  9.00 02/14/2012

;

run;

proc sql;

select distinct account_number, tran_am, tran_dt from a

group by account_number having tran_dt = max(tran_dt);

quit;

Hima
Obsidian | Level 7

Art - I see what you are saying Smiley Happy. If the date in the table has time stamp may be that will work?

Suppose that the format of the tran_date is something like this 12MAY2009:00:00:00.00000 (datetime25.6)

data a;

input account_number $ tran_am $ tran_dt $;

cards;

12345678 14.00 12JAN2012:00:00:00.00000

12345678 12.00 12JAN2012:04:50:18.254602

12345678  9.00 12JAN2012:09:23:00.00000

23456789 13.00 11JAN2012:00:00:00.00000

23456789 15.00 11JAN2012:04:50:18.254602

23456789  9.00 11JAN2012:09:23:00.00000

;

run;

proc sql;

select distinct account_number, tran_am, tran_dt from a

group by account_number having tran_dt = max(tran_dt);

quit;

Raghs_Newbee - Can you please post sample data?
Ksharp
Super User

Yes. the value of datetime is actually the number of second from 01jan1960:00:00:00

Raghs_Newbee
Calcite | Level 5

Hi Hima I have attached some sample data in the post. Thanks to all for the wonderfull help.

Ksharp
Super User

How about:

proc import datafile='c:\Test_Data.xls' out=test dbms=excel replace;
 usedate=no;scantime=no;
run;
data temp(drop=_date);
 set test(rename=(date=_date));
 date=input(_date,datetime20.);
 format date datetime.;
 run;
proc sql;
 create table want as 
  select * from temp
   group by customer_id
    having date=max(date);
quit;


Ksharp

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 11610 views
  • 0 likes
  • 5 in conversation