I am trying to find the max/min value of certain variables and the associated product of that max/min value account.
Have:
cust_id | account | prod | limit | balance |
111 | 1234 | A | 10,000 | 500 |
111 | 5678 | B | 20,000 | 1,000 |
111 | 2345 | C | 3,000 | 2,000 |
111 | 6789 | B | 2,000 | 3,000 |
222 | 4321 | A | 1,000 | 2,000 |
222 | 8765 | A | 4,000 | 1,000 |
222 | 7654 | C | 6,000 | 5,000 |
Want:
cust_id | max_lmt | prod_max_lmt | min_bal | prod_min_bal |
111 | 20,000 | B | 500 | A |
222 | 6,000 | C | 1,000 | A |
I could use a few data steps to achieve that, but are there any efficient way? Thanks in advance.
/****get max_lmt prod_max_lmt****/
proc sort data=have;
by cust_id limit;
run;
data have1(keep=cust_id prod limit rename=(limit=max_lmt prod=max_prod_lmt));
set have;
by cust_id limit;
if last.cust_id;
run;
Since you now have a rule for treating tied maximum limits or tied minimum balances, here is a program to accomodate:
data want (keep=cust_id max_lmt prod_max_lmt min_bal prod_min_bal);
set have ;
by cust_id;
retain prod_max_lmt prod_min_bal ' ';
if first.cust_id then do;
set have (keep=limit balance rename=(limit=max_lmt balance=min_bal)) point=_n_;
prod_max_lmt=prod;
prod_min_bal=prod;
end;
else do;
if limit>=max_lmt then do;
if limit>max_lmt then prod_max_lmt=prod;
else prod_max_lmt=catx('&',prod_max_lmt,prod);
max_lmt=limit;
end;
if balance<=min_bal then do;
if balance<min_bal then prod_min_bal=prod;
else prod_min_bal=catx('&',prod_min_bal,prod);
min_bal=balance;
end;
end;
if last.cust_id;
run;
Note the retain statement initializes prod_max_lmt and prod_min_bal to a string of 7 blanks, thereby giving those variables a length of $7, which would accommodate up to 4 single-char PRODs separated by '&'. Use a longer initial series of blanks if you anticipate there will be more than 4 ties.
And what do you want to do if there are multiple observations with the max limit, but different products? Or tied minimum balance with different products?
Good question. For example, if product A & B have same max limit, then the prod_max_lmt should be 'A&B'
Proc means/summary comes close until you no longer have a unique Max/Min identfier
proc means data=have nway noprint ; class cust_id; var limit balance; output out=want (drop=_type_ _freq_) idgroup ( max(limit) out(limit prod )= max_lmt prod_max_lmt ) idgroup ( min(balance) out(balance prod )= min_lmt prod_min_bal ) ; run;
Slightly different solution:
max_limit = 0;
min_balance = constant('big');
call missing(prod_max_limit, prod_min_balance);
end;
if limit > max_limit then do;
max_limit = limit;
prod_max_limit = prod;
end;
else do;
if limit = max_limit then do;
prod_max_limit = catx('&', prod_max_limit, prod);
end;
end;
if balance < min_balance then do;
min_balance = balance;
prod_min_balance = prod;
end;
else do;
if balance = min_balance then do;
prod_min_balance = catx('&', prod_min_balance, prod);
end;
end;
if last.cust_id then do;
output;
end;
run;
Assuming you have no ties for max_lmt or min_bal, then this program will do what you want:
data want (keep=cust_id max_lmt prod_max_lmt min_bal prod_min_bal);
set have ;
by cust_id;
if first.cust_id then do;
set have (keep=limit prod rename=(limit=max_lmt prod=prod_max_lmt)) point=_n_;
set have (keep=balance prod rename=(balance=min_bal prod=prod_min_bal)) point=_n_;
end;
if limit>max_lmt then do;
max_lmt=limit;
prod_max_lmt=prod;
end;
if balance<min_bal then do;
min_bal=balance;
prod_min_bal=prod;
end;
if last.cust_id;
run;
Every time this program encounters a new cust_id, it re-reads that observation with via the IF FIRST.ID THEN DO and twiSET .... POINT= ... ; statements. The variables read by these conditional set statements are renamed, and will therefore not be overwritten by a new SET action until the next cust_id is encountered. These variables are retained over the current cust_id, and updated as a new maximum limit or minimum balance is encountered.
The last statement "if last.custid;" is a "subsetting if". Only the values reached at the end of the cust_id will be output to the new dataset.
Since you now have a rule for treating tied maximum limits or tied minimum balances, here is a program to accomodate:
data want (keep=cust_id max_lmt prod_max_lmt min_bal prod_min_bal);
set have ;
by cust_id;
retain prod_max_lmt prod_min_bal ' ';
if first.cust_id then do;
set have (keep=limit balance rename=(limit=max_lmt balance=min_bal)) point=_n_;
prod_max_lmt=prod;
prod_min_bal=prod;
end;
else do;
if limit>=max_lmt then do;
if limit>max_lmt then prod_max_lmt=prod;
else prod_max_lmt=catx('&',prod_max_lmt,prod);
max_lmt=limit;
end;
if balance<=min_bal then do;
if balance<min_bal then prod_min_bal=prod;
else prod_min_bal=catx('&',prod_min_bal,prod);
min_bal=balance;
end;
end;
if last.cust_id;
run;
Note the retain statement initializes prod_max_lmt and prod_min_bal to a string of 7 blanks, thereby giving those variables a length of $7, which would accommodate up to 4 single-char PRODs separated by '&'. Use a longer initial series of blanks if you anticipate there will be more than 4 ties.
prov sql;
create table max as
select *
from have
group by cust_id
having limit = max(limit)
;
quit;
data want;
set max;
by cust_id;
length products $100;
retain products;
if first.cust_id then products = "";
products = catx(",",products,prod);
if last.cust_id;
drop prod;
run;
Add additional code in the DATA step for other variables you want in the output.
data have; infile cards expandtabs; input cust_id account prod $ limit : comma. balance : comma.; cards; 111 1234 A 10,000 500 111 5678 B 20,000 1,000 111 2345 C 3,000 2,000 111 6789 B 2,000 3,000 222 4321 A 1,000 2,000 222 8765 A 4,000 1,000 222 7654 C 6,000 5,000 ; proc sort data=have out=temp1;by cust_id descending limit;run; data temp11; set temp1; by cust_id descending limit; if first.cust_id then n=0; if first.limit then n+1; if n=1; run; data part1; do until(last.cust_id); set temp11; by cust_id; length prod_max_lmt $ 200; prod_max_lmt=catx('&',prod_max_lmt,prod); end; keep cust_id prod_max_lmt limit; rename limit=max_lmt; run; proc sort data=have out=temp2;by cust_id balance;run; data temp21; set temp2; by cust_id balance ; if first.cust_id then n=0; if first.balance then n+1; if n=1; run; data part2; do until(last.cust_id); set temp21; by cust_id; length prod_min_bal $ 200; prod_min_bal=catx('&',prod_min_bal,prod); end; keep cust_id prod_min_bal balance; rename balance=min_bal; run; data want; merge part1 part2; by cust_id; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.