BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zhige50
Obsidian | Level 7

I am trying to find the max/min value of certain variables and the associated product of that max/min value account. 

 

Have:

cust_id account prod limit balance
111 1234 A         10,000                 500
111 5678 B         20,000                 1,000
111 2345 C           3,000                 2,000
111 6789 B           2,000                 3,000
222 4321 A           1,000                 2,000
222 8765 A           4,000                 1,000
222 7654 C           6,000                    5,000

Want:

cust_id max_lmt prod_max_lmt min_bal prod_min_bal
111 20,000 B 500 A
222 6,000 C 1,000 A

 

I could use a few data steps to achieve that, but are there any efficient way? Thanks in advance.

/****get max_lmt prod_max_lmt****/

proc sort data=have;

by cust_id limit;

run;

data have1(keep=cust_id prod limit rename=(limit=max_lmt prod=max_prod_lmt));

set have;

by cust_id limit;

if last.cust_id;

run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Since you now have a rule for treating tied maximum limits or tied minimum balances, here is a program to accomodate:

 

data want (keep=cust_id max_lmt prod_max_lmt  min_bal prod_min_bal);
  set have ;
  by cust_id;

  retain prod_max_lmt prod_min_bal '       '; 

  if first.cust_id then do;
    set have (keep=limit balance rename=(limit=max_lmt  balance=min_bal)) point=_n_;
    prod_max_lmt=prod;
    prod_min_bal=prod;
  end;
  else do;
    if limit>=max_lmt then do; 
      if limit>max_lmt then prod_max_lmt=prod;
      else prod_max_lmt=catx('&',prod_max_lmt,prod);
      max_lmt=limit;
    end;
    if balance<=min_bal then do; 
      if balance<min_bal then prod_min_bal=prod;
      else prod_min_bal=catx('&',prod_min_bal,prod);
      min_bal=balance;
    end;
  end;
  if last.cust_id;
run;

Note the retain statement initializes prod_max_lmt and prod_min_bal to a string of 7 blanks, thereby giving those variables a length of $7, which would accommodate up to 4 single-char PRODs separated by '&'.  Use a longer initial series of blanks if you anticipate there will be more than 4 ties.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
mkeintz
PROC Star

And what do you want to do if there are multiple observations with the max limit, but different products?  Or tied minimum balance with different products?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
zhige50
Obsidian | Level 7

Good question. For example, if product A & B have same max limit, then the prod_max_lmt should be 'A&B'

 

ballardw
Super User

Proc means/summary comes close until you no longer have a unique Max/Min identfier

proc means data=have nway noprint ;
   class cust_id;
   var limit balance;
   output out=want (drop=_type_ _freq_)
      idgroup ( max(limit) out(limit prod )= max_lmt prod_max_lmt 	  )
      idgroup ( min(balance) out(balance prod )= min_lmt prod_min_bal 	  )
   ;
run;
andreas_lds
Jade | Level 19

Slightly different solution:

 max_limit = 0;
      min_balance = constant('big');
      call missing(prod_max_limit, prod_min_balance);
   end;

   if limit > max_limit then do;
      max_limit = limit;
      prod_max_limit = prod;
   end;
   else do;
      if limit = max_limit then do;
         prod_max_limit = catx('&', prod_max_limit, prod);
      end;
   end;

   if balance < min_balance then do;
      min_balance = balance;
      prod_min_balance = prod;
   end;
   else do;
      if balance = min_balance then do;
         prod_min_balance = catx('&', prod_min_balance, prod);
      end;
   end;

   if last.cust_id then do;
      output;
   end;
run;
mkeintz
PROC Star

Assuming you have no ties for max_lmt or min_bal, then this program will do what you want:

 

data want (keep=cust_id max_lmt prod_max_lmt  min_bal prod_min_bal);
  set have ;
  by cust_id;
  if first.cust_id then do;
    set have (keep=limit prod rename=(limit=max_lmt   prod=prod_max_lmt)) point=_n_;
    set have (keep=balance prod rename=(balance=min_bal prod=prod_min_bal)) point=_n_;
  end;
  if limit>max_lmt then do; 
    max_lmt=limit;
    prod_max_lmt=prod;
  end;
  if balance<min_bal then do; 
    min_bal=balance;
    prod_min_bal=prod;
  end;
  if last.cust_id;
run;

Every time this program encounters a new cust_id, it re-reads that observation with via the IF FIRST.ID THEN DO and  twiSET .... POINT= ... ; statements.  The variables read by these conditional set statements are renamed, and will therefore not be overwritten by a new SET action until the next cust_id is encountered.  These variables are retained over the current cust_id, and updated as a new maximum limit or minimum balance is encountered.

 

The last statement "if last.custid;" is a "subsetting if".  Only the values reached at the end of the cust_id will be output to the new dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Since you now have a rule for treating tied maximum limits or tied minimum balances, here is a program to accomodate:

 

data want (keep=cust_id max_lmt prod_max_lmt  min_bal prod_min_bal);
  set have ;
  by cust_id;

  retain prod_max_lmt prod_min_bal '       '; 

  if first.cust_id then do;
    set have (keep=limit balance rename=(limit=max_lmt  balance=min_bal)) point=_n_;
    prod_max_lmt=prod;
    prod_min_bal=prod;
  end;
  else do;
    if limit>=max_lmt then do; 
      if limit>max_lmt then prod_max_lmt=prod;
      else prod_max_lmt=catx('&',prod_max_lmt,prod);
      max_lmt=limit;
    end;
    if balance<=min_bal then do; 
      if balance<min_bal then prod_min_bal=prod;
      else prod_min_bal=catx('&',prod_min_bal,prod);
      min_bal=balance;
    end;
  end;
  if last.cust_id;
run;

Note the retain statement initializes prod_max_lmt and prod_min_bal to a string of 7 blanks, thereby giving those variables a length of $7, which would accommodate up to 4 single-char PRODs separated by '&'.  Use a longer initial series of blanks if you anticipate there will be more than 4 ties.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User
prov sql;
create table max as
  select *
  from have
  group by cust_id
  having limit = max(limit)
;
quit;

data want;
set max;
by cust_id;
length products $100;
retain products;
if first.cust_id then products = "";
products = catx(",",products,prod);
if last.cust_id;
drop prod;
run;

Add additional code in the DATA step for other variables you want in the output.

Ksharp
Super User
data have;
infile cards expandtabs;
input cust_id account  prod $  limit : comma. balance : comma.;
cards;
111 1234 A         10,000                 500
111 5678 B         20,000                 1,000
111 2345 C           3,000                 2,000
111 6789 B           2,000                 3,000
222 4321 A           1,000                 2,000
222 8765 A           4,000                 1,000
222 7654 C           6,000                    5,000
;

proc sort data=have out=temp1;by cust_id descending limit;run;
data temp11;
 set temp1;
 by cust_id descending limit;
 if first.cust_id then n=0;
 if first.limit then n+1;
 if n=1;
run;
data part1;
do until(last.cust_id);
 set temp11;
 by cust_id;
length prod_max_lmt  $ 200;
prod_max_lmt=catx('&',prod_max_lmt,prod);
end;
keep cust_id prod_max_lmt limit;
rename limit=max_lmt;
run;


proc sort data=have out=temp2;by cust_id  balance;run;
data temp21;
 set temp2;
 by cust_id balance ;
 if first.cust_id then n=0;
 if first.balance then n+1;
 if n=1;
run;
data part2;
do until(last.cust_id);
 set temp21;
 by cust_id;
length prod_min_bal  $ 200;
prod_min_bal=catx('&',prod_min_bal,prod);
end;
keep cust_id prod_min_bal balance;
rename balance=min_bal;
run;



data want;
merge part1 part2;
by cust_id;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 782 views
  • 7 likes
  • 6 in conversation