Hi i have a situation where i need to select only the transactions from each ID where the no.of transactions made to a perticular country is more if ties then latest transaction date decides.
data have;
infile cards;
informat trxn_date ddmmyy10.;
format trxn_date ddmmyy10.;
input ID Country_from $ country_to $ trxn_date amount ;
cards;
1 US IND 01-03-2017 387
1 US IND 02-03-2017 388
1 US IND 03-03-2017 514
1 US UK 04-03-2017 889
1 US UK 05-03-2017 987
1 US UK 06-03-2017 690
1 US UK 07-03-2017 158
1 US AUS 08-03-2017 341
1 US AUS 09-03-2017 247
1 US CAN 10-03-2017 699
1 US CAN 11-03-2017 623
2 UK IND 04-03-2017 627
2 UK IND 05-03-2017 822
2 UK IND 06-03-2017 724
2 UK US 07-03-2017 231
2 UK US 08-03-2017 437
2 UK US 04-03-2017 193
2 UK AUS 05-03-2017 922
2 UK AUS 06-03-2017 750
2 UK AUS 07-03-2017 123
2 UK CAN 08-03-2017 920
2 UK CAN 08-03-2017 587
;;
run;
Expected Output :
1 US UK 04-03-2017 889
1 US UK 05-03-2017 987
1 US UK 06-03-2017 690
1 US UK 07-03-2017 158
2 UK US 07-03-2017 231
2 UK US 08-03-2017 437
2 UK US 04-03-2017 193
The appearance of your output suggests you want all the transactions for a given id, that come from the most frequent COUNTRY_FROM/COUNTRY_TO pair. But it also appears, though not stated, that country_from is constant for each id. If so, then you really want a list of transactions that have the most frequent COUNTRY_TO.
And in the case of multiple country_to's with the same frequency choose the country with the latest transaction date. You provide no rules for when the tied countries also have ties in latest transaction date, so we'll ignore that for now.
From the look of your datasets, it seems that
data have;
infile cards;
informat trxn_date ddmmyy10.;
format trxn_date yymmddn8.;
input ID Country_from $ country_to $ trxn_date amount ;
cards;
1 US IND 01-03-2017 387
1 US IND 02-03-2017 388
1 US IND 03-03-2017 514
1 US UK 04-03-2017 889
1 US UK 05-03-2017 987
1 US UK 06-03-2017 690
1 US UK 07-03-2017 158
1 US AUS 08-03-2017 341
1 US AUS 09-03-2017 247
1 US CAN 10-03-2017 699
1 US CAN 11-03-2017 623
2 UK IND 04-03-2017 627
2 UK IND 05-03-2017 822
2 UK IND 06-03-2017 724
2 UK US 07-03-2017 231
2 UK US 08-03-2017 437
2 UK US 04-03-2017 193
2 UK AUS 05-03-2017 922
2 UK AUS 06-03-2017 750
2 UK AUS 07-03-2017 123
2 UK CAN 08-03-2017 920
2 UK CAN 08-03-2017 587
;;
run;
data want (drop=_:);
do until (last.id);
do _nt=1 by 1 until (last.country_to);
set have;
by id country_to notsorted;
if _nt=1 then _to_maxdate=trxn_date;
else _to_maxdate=max(_to_maxdate,trxn_date);
end;
if (_nt>_maxnt) or (_nt=_maxnt and _to_maxdate>_id_maxdate) then do;
_maxnt=_nt;
_max_to=country_to;
_id_maxdate=_to_maxdate;
end;
end;
do until (last.id);
do until(last.country_to);
set have;
by id country_to notsorted;
if country_to=_max_to then output;
end;
end;
run;
Notes:
Assuming I understood what you mean. data have; infile cards; informat trxn_date ddmmyy10.; format trxn_date ddmmyy10.; input ID Country_from $ country_to $ trxn_date amount ; cards; 1 US IND 01-03-2017 387 1 US IND 02-03-2017 388 1 US IND 03-03-2017 514 1 US UK 04-03-2017 889 1 US UK 05-03-2017 987 1 US UK 06-03-2017 690 1 US UK 07-03-2017 158 1 US AUS 08-03-2017 341 1 US AUS 09-03-2017 247 1 US CAN 10-03-2017 699 1 US CAN 11-03-2017 623 2 UK IND 04-03-2017 627 2 UK IND 05-03-2017 822 2 UK IND 06-03-2017 724 2 UK US 07-03-2017 231 2 UK US 08-03-2017 437 2 UK US 04-03-2017 193 2 UK AUS 05-03-2017 922 2 UK AUS 06-03-2017 750 2 UK AUS 07-03-2017 123 2 UK CAN 08-03-2017 920 2 UK CAN 08-03-2017 587 ;; run; data temp; set have; from=country_from; to=country_to; call sortc(from,to); run; proc sql; select * from temp group by from,to having count(distinct id) ne 1; quit;
Hi @Ksharp,
Thanks for the reply i think you got the list of countires where more than one ID's had the transactions. But my issue is i need to select a list of all the transactions made by each ID to a unique country and the unique country must be selected based on the no.of transactions(more) , if thereare more than one country having the maximum count then the latest transaction date would decide(country having the latest transaction).
Hope it make sense.
Hi,
you can try this (I wish I could find something that is not as lengthy though)
proc summary data=have missing nway;
class ID Country_from country_to;
var trxn_date ;
output out=temp (drop=_type_ rename=(_freq_=no_trans)) max(trxn_date)=max_date;
run;
proc sort data=temp ;
by id descending no_trans descending max_date;
run;
data temp (drop=max_date no_trans );
set temp ;
by id descending no_trans descending max_date;
if first.id ;
run;
proc sort data=temp;
by id Country_from country_to ;
run;
proc sort data=have;
by id Country_from country_to ;
run;
data have;
merge have (in=A) temp (in=b);
by id Country_from country_to ;
if a and b;
run;
proc datasets lib=work nolist;
delete temp;
quit;
proc print data=have noobs;
run;
The appearance of your output suggests you want all the transactions for a given id, that come from the most frequent COUNTRY_FROM/COUNTRY_TO pair. But it also appears, though not stated, that country_from is constant for each id. If so, then you really want a list of transactions that have the most frequent COUNTRY_TO.
And in the case of multiple country_to's with the same frequency choose the country with the latest transaction date. You provide no rules for when the tied countries also have ties in latest transaction date, so we'll ignore that for now.
From the look of your datasets, it seems that
data have;
infile cards;
informat trxn_date ddmmyy10.;
format trxn_date yymmddn8.;
input ID Country_from $ country_to $ trxn_date amount ;
cards;
1 US IND 01-03-2017 387
1 US IND 02-03-2017 388
1 US IND 03-03-2017 514
1 US UK 04-03-2017 889
1 US UK 05-03-2017 987
1 US UK 06-03-2017 690
1 US UK 07-03-2017 158
1 US AUS 08-03-2017 341
1 US AUS 09-03-2017 247
1 US CAN 10-03-2017 699
1 US CAN 11-03-2017 623
2 UK IND 04-03-2017 627
2 UK IND 05-03-2017 822
2 UK IND 06-03-2017 724
2 UK US 07-03-2017 231
2 UK US 08-03-2017 437
2 UK US 04-03-2017 193
2 UK AUS 05-03-2017 922
2 UK AUS 06-03-2017 750
2 UK AUS 07-03-2017 123
2 UK CAN 08-03-2017 920
2 UK CAN 08-03-2017 587
;;
run;
data want (drop=_:);
do until (last.id);
do _nt=1 by 1 until (last.country_to);
set have;
by id country_to notsorted;
if _nt=1 then _to_maxdate=trxn_date;
else _to_maxdate=max(_to_maxdate,trxn_date);
end;
if (_nt>_maxnt) or (_nt=_maxnt and _to_maxdate>_id_maxdate) then do;
_maxnt=_nt;
_max_to=country_to;
_id_maxdate=_to_maxdate;
end;
end;
do until (last.id);
do until(last.country_to);
set have;
by id country_to notsorted;
if country_to=_max_to then output;
end;
end;
run;
Notes:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.