- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
I have 3 million obs with account numbers of different persons with N number of transactions at different dates for the last three years.
I need to collect the Distinct(Account_Number) and the Amount which is actually the last transaction made by the customer.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If there isn't a chance that customers had multiple transactions on the last date, the task would be easy with proc sql, simply using distinct and where date=max(date).
However, since customers probably could have multiple transactions, and if the data are already in date order, then I would recommend sorting the file by account_Number and then using last.account_Number in a datastep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Art, But unfortunately yes.. few customer have more transactions in the last date.. So i am unable to do so using Proc SQL. The report I am pulling also consits of a lot of other variables that needs either to be summed up or transposed to find the count. I will post some Sample data and the output I need for a Better understanding.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do your date value have a time portion to identify different transactions on the same date or do your transaction number serve the same purpose?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, The Date Column is in DateTime format. Attached is a Sample Data. I am not worried about the blanks in the Date Column, Just need the Last Date Time Transaction. I did try the Querry of HIMA. I get repeated Account ID when the transaction Amount is Different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 12.00 02/16/2012
12345678 14.00 02/15/2012
23456789 15.00 02/14/2012
23456789 13.00 02/13/2012
;
run;
proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;
Output:
account_
number tran_am tran_dt
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
12345678 12.00 02/16/20
23456789 15.00 02/14/20
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The above query works good even if there are multiple transactions with in the max date.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hima: your suggested code will produce multiple records when there are multiple transactions on the last day and they are for different amounts. e.g.:
data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 14.00 02/15/2012
12345678 12.00 02/16/2012
12345678 9.00 02/16/2012
23456789 13.00 02/13/2012
23456789 15.00 02/14/2012
23456789 9.00 02/14/2012
;
run;
proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Art - I see what you are saying . If the date in the table has time stamp may be that will work?
Suppose that the format of the tran_date is something like this 12MAY2009:00:00:00.00000 (datetime25.6)
data a;
input account_number $ tran_am $ tran_dt $;
cards;
12345678 14.00 12JAN2012:00:00:00.00000
12345678 12.00 12JAN2012:04:50:18.254602
12345678 9.00 12JAN2012:09:23:00.00000
23456789 13.00 11JAN2012:00:00:00.00000
23456789 15.00 11JAN2012:04:50:18.254602
23456789 9.00 11JAN2012:09:23:00.00000
;
run;
proc sql;
select distinct account_number, tran_am, tran_dt from a
group by account_number having tran_dt = max(tran_dt);
quit;
Raghs_Newbee - Can you please post sample data?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. the value of datetime is actually the number of second from 01jan1960:00:00:00
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hima I have attached some sample data in the post. Thanks to all for the wonderfull help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about:
proc import datafile='c:\Test_Data.xls' out=test dbms=excel replace; usedate=no;scantime=no; run; data temp(drop=_date); set test(rename=(date=_date)); date=input(_date,datetime20.); format date datetime.; run; proc sql; create table want as select * from temp group by customer_id having date=max(date); quit;
Ksharp