Hi Team,
I have 3 million obs with account numbers of different persons with N number of transactions at different dates for the last three years.
I need to collect the Distinct(Account_Number) and the Amount which is actually the last transaction made by the customer.
Thanks
If there isn't a chance that customers had multiple transactions on the last date, the task would be easy with proc sql, simply using distinct and where date=max(date).
However, since customers probably could have multiple transactions, and if the data are already in date order, then I would recommend sorting the file by account_Number and then using last.account_Number in a datastep.
Thanks Art, But unfortunately yes.. few customer have more transactions in the last date.. So i am unable to do so using Proc SQL. The report I am pulling also consits of a lot of other variables that needs either to be summed up or transposed to find the count. I will post some Sample data and the output I need for a Better understanding.
Do your date value have a time portion to identify different transactions on the same date or do your transaction number serve the same purpose?
Yes, The Date Column is in DateTime format. Attached is a Sample Data. I am not worried about the blanks in the Date Column, Just need the Last Date Time Transaction. I did try the Querry of HIMA. I get repeated Account ID when the transaction Amount is Different.
data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 12.00 02/16/2012
12345678 14.00 02/15/2012
23456789 15.00 02/14/2012
23456789 13.00 02/13/2012
;
run;
proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;
Output:
account_
number tran_am tran_dt
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
12345678 12.00 02/16/20
23456789 15.00 02/14/20
The above query works good even if there are multiple transactions with in the max date.
Hima: your suggested code will produce multiple records when there are multiple transactions on the last day and they are for different amounts. e.g.:
data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 14.00 02/15/2012
12345678 12.00 02/16/2012
12345678 9.00 02/16/2012
23456789 13.00 02/13/2012
23456789 15.00 02/14/2012
23456789 9.00 02/14/2012
;
run;
proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;
Art - I see what you are saying . If the date in the table has time stamp may be that will work?
Suppose that the format of the tran_date is something like this 12MAY2009:00:00:00.00000 (datetime25.6)
data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 14.00 12JAN2012:00:00:00.00000
12345678 12.00 12JAN2012:04:50:18.254602
12345678 9.00 12JAN2012:09:23:00.00000
23456789 13.00 11JAN2012:00:00:00.00000
23456789 15.00 11JAN2012:04:50:18.254602
23456789 9.00 11JAN2012:09:23:00.00000
;
run;
proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;
Yes. the value of datetime is actually the number of second from 01jan1960:00:00:00
Hi Hima I have attached some sample data in the post. Thanks to all for the wonderfull help.
How about:
proc import datafile='c:\Test_Data.xls' out=test dbms=excel replace; usedate=no;scantime=no; run; data temp(drop=_date); set test(rename=(date=_date)); date=input(_date,datetime20.); format date datetime.; run; proc sql; create table want as select * from temp group by customer_id having date=max(date); quit;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.