It sucks that smallest and largest don't work with strings. Here's how I got around it.
With this data...multiple lines per member, claim and even claim line there are up to 4 'code's:
Member Claim Claim_Line code1 code2 code3 code4
866738350 14003347700 1 95 95 B23 B23
866738350 14003347700 4 95 95 B23 B23
866738350 14003347700 3 95 95 B23 B23
866738350 14003347700 2 95 95 B23 B23
866738350 14003347700 5 95 95 B23 B23
850308450 142801958000 2 B13 B13 96 96
861027250 142802729900 1 95 95 96 45
861027250 142802729900 1 95 95 45 45
855217600 142802814700 1 95 95 45 16
855217600 142802814700 1 95 95 16 16
So, what I do is this:
First, add a row number to uniquely identify each line:
data WORK.t2;
set work.t1;
rownum=_n_;
run;
then spit out the 4 (could be less) lines per row (one per code):
data WORK.t3(keep=rownum code);
format code $ 10.;
set work.t1;
rownum=_n_;
if not missing(code1) then do; code=code1; output; end;
if not missing(code2) then do; code=code2; output; end;
if not missing(code3) then do; code=code3; output; end;
if not missing(code4) then do; code=code4; output; end;
run;
Then sort, removing duplicate codes:
proc sort data=WORK.t3 nodup; by rownum descending code ;
run;
Then add a row number to that resulting deduped table:
data WORK.t4; set work.t3;
coderow=_n_;
Then rank that table by code descending to handle missing codes:
proc rank data=WORK.t4 out=work.t5 ties=low descending;
by rownum;
var coderow;
ranks codeRank ;
run;
Then get the up-to 4 remaining, deduped codes per original line:
PROC SQL number;
CREATE TABLE WORK.t6 AS
SELECT
rownum,
max(case when coderank=1 then code end) as code1,
max(case when coderank=2 then code end) as code2,
max(case when coderank=3 then code end) as code3,
max(case when coderank=4 then code end) as code4
FROM work.t5
group by rownum
;
QUIT;
Resulting in this table:
rownum code1 code2 code3 code4
1 95 B23
2 95 B23
3 95 B23
4 95 B23
5 95 B23
6 96 B13
7 45 95 96
8 45 95
9 16 45 95
10 16 95
You might explain what the problem you are "solving" actually is. I am not sure I understand.
Since you have a SORT step you may want to look at the options for SORTSEQ and LINGUISTIC which will correctly sort
B23
B3
C18
to
B3
B23
C18.
I also think your example data is poor as the ranges of values don't exercise Codes starting with different letters, more than one letter, any special charaters like - and every single 95 or B23 value is associated with the same 2 variables and paired
*******************************************;
data claim_a;
set claim;
ord=_N_;
run;
proc sort data=claim_a out=claim_as;
by ord;
run;
proc transpose data=claim_as out=t_claim;
var code1 code2 code3 code4;
by ord;
run;
proc sort data=t_claim out=t_claim_s nodupkey;
by ord descending COL1;
run;
proc transpose data=t_claim_s out=ranked_claim (drop=_:) prefix=ranked_code;
var COL1;
by ord;
run;
***************;
If I understand what you are attempting to do, you could do it on one datastep: store the values in an array, sort the array with call sortc, depuplicate it by looping through the values.
data t;
CODE1='95 '; CODE2='95 ';CODE3='B23';CODE4='B23';
array CODE [4] $;
call sortc(of CODE[*]);
do I=1 to 3;
if CODE[I]=CODE[I+1] then CODE[I+1]='ZZ';
end;
call sortc(of CODE[*]);
do I=1 to 4;
if CODE[I]='ZZ' then CODE[I]=' ';
end;
run;
You could also use a SAS Hash table to both dedupe and sort your array in one go.
data have;
input (Member Claim) (:$11.) (Claim_Line code1 code2 code3 code4) ($);
datalines;
866738350 14003347700 1 95 95 B23 B23
866738350 14003347700 4 95 95 B23 B23
866738350 14003347700 3 95 95 B23 B23
866738350 14003347700 2 95 95 B23 B23
866738350 14003347700 5 95 95 B23 B23
850308450 142801958000 2 B13 B13 96 96
861027250 142802729900 1 95 95 96 45
861027250 142802729900 1 95 95 45 45
855217600 142802814700 1 95 95 45 16
855217600 142802814700 1 95 95 16 16
;
run;
data want(drop=_:);
set have;
if _n_=1 then
do;
if 0 then _code=code1;
dcl hash h1(ordered:'y',hashexp:2);
h1.defineKey('_code');
h1.defineData('_code');
h1.defineDone();
dcl hiter hh1('h1');
end;
array codes{*} code1 code2 code3 code4;
do _i=1 to dim(codes);
_code=codes[_i];
h1.ref();
end;
call missing(of codes[*]);
_i=1;
_rc = hh1.first();
do while (_rc = 0);
codes[_i]=_code;
_rc = hh1.next();
_i+1;
end;
h1.clear();
run;
@tomrvincent: I was bored (for a few minutes) this afternoon, and found the problem challenging, so created the following. It solves the problem in one datastep AND accounts for linguistic collation (sort of). It uses the qsort macro that Paul Dorfman posted years ago and included in the code. If one of our hash experts can accomplish the same thing using the hash object, I'd definitely be interested in seeing how it could be done). What it does is create a new sort key that, basically, normalizes the data, sorts by that key, but identifies the largest and smallest based on the non-normalized data values:
%Macro Qsort ( Arr = /* Parallel array name list */ , By = %QScan(&Arr,1,%Str( )) /* Key array name */ , Seq = A /* Seq=D for descending */ , LB = Lbound(&By) /* Lower bound to sort */ , HB = Hbound(&By) /* Upper bound to sort */ , M = 9 /* Tuning range: [1:15] */ ) ; %Local _ H I J L N P Q S T W ; %Macro Swap (I,J) ; %Local W ; Do ; %Do W = 1 %To &N ; &T&W = &&A&W(&I) ; &&A&W(&I) = &&A&W(&J) ; &&A&W(&J) = &T&W ; %End ; End ; %Mend Swap ; %If %Upcase(&Seq) = %Upcase(A) %Then %Let Q = G ; %Else %Let Q = L ; %Do %Until (&&A&N EQ ) ; %Let N = %Eval(&N + 1) ; %Local A&N ; %Let A&N = %Scan(&Arr,&N,%Str( )) ; %End ; %Let N = %Eval(&N - 1) ; %Let _ = %Substr(%Sysfunc(Ranuni(0)),3, %Eval(7 - %Length(&N) + 5*(%Substr(&Sysver,1,1) GT 6))) ; %Let H = H&_ ; %Let I = I&_ ; %Let J = J&_ ; %Let L = L&_ ; %Let P = P&_ ; %Let S = S&_ ; %Let T = T&_ ; %Let Z = Z&_ ; Array &Z (0:1, 0:50) _Temporary_ ; &L = &LB ; &H = &HB ; If &H - &L GT &M Then Do &S=1 By 0 While (&S) ; &J = (&H - &L)/3 ; &I = &L + &J ; &J = &I + &J ; If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ; If &By(&I) &Q.T &By(&J) Then %Swap(&I,&J) ; If &By(&J) &Q.T &By(&H) Then %Swap(&J,&H) ; If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ; If &By(&I) &Q.T &By(&J) Then %Swap(&I,&J) ; If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ; %If &M LE 3 %Then %Do ; If &H - &L LE 3 Then Do ; &L = &Z(0,&S) ; &H = &Z(1,&S) ; &S +- 1 ; Continue ; End ; %End ; %Swap(&L,&I) ; &P = &By(&L) ; &I = &L ; Do &J=&H + 1 By 0 ; Do &I=&I + 1 By + 1 Until(&By(&I) &Q.E &P) ; End ; Do &J=&J - 1 By - 1 Until(&P &Q.E &By(&J)) ; End ; If &I GE &J Then Leave ; %Swap(&I,&J) ; End ; %Swap(&L,&J) ; If &H - &J GE &J - &L GT &M Then Do &S = &S + 1 ; &Z(0,&S) = &J + 1 ; &Z(1,&S) = &H ; &H = &J - 1 ; End ; Else If &J - &L GE &H - &J GT &M Then Do &S = &S + 1 ; &Z(0,&S) = &L ; &Z(1,&S) = &J - 1 ; &L = &J + 1 ; End ; Else If &J - &L GT &M GE &H - &J Then &H = &J - 1 ; Else If &H - &J GT &M GE &J - &L Then &L = &J + 1 ; Else Do ; &L = &Z(0,&S) ; &H = &Z(1,&S) ; &S +- 1 ; End ; End ; %If &M = 1 %Then %Goto Exit ; Do &J = &LB + 1 To &HB ; If &By(&J - 1) &Q.T &By(&J) Then Do ; &P = &By(&J) ; %Do W = 1 %To &N ; %If &&A&W Ne &By %Then &T&W = &&A&W(&J) ; ; %End ; Do &I = &J - 1 To &LB By - 1 ; If &P &Q.E &By(&I) Then Leave ; %Do W = 1 %To &N ; &&A&W(&I + 1) = &&A&W(&I) ; %End ; End ; &By(&I + 1) = &P ; %Do W = 1 %To &N ; %If &&A&W Ne &By %Then &&A&W(&I + 1) = &T&W ; ; %End ; End ; End ; %Exit: Drop &H &I &J &L &P &S T&_: ; %Mend Qsort ; data have; informat Member $9.; informat Claim $12.; informat code1-code4 $8.; infile cards truncover; input Member Claim Claim_Line code1-code4; cards; 866738350 14003347700 1 North3 North21 North12 North35 866738350 14003347700 4 95 95 B23 B23 866738350 14003347700 3 95 95 B12 B2 866738350 14003347700 2 CCCCC2 CCCCC12 CCCCC39 866738350 14003347700 5 B750 B86 B7 B12 850308450 142801958000 2 B13 B13 96 96 861027250 142802729900 1 AA85 AA9 96 45 861027250 142802729900 1 3 33 12 2 855217600 142802814700 1 95 95 45 16 855217600 142802814700 1 95 95 16 16 ; data want (drop=i firstnum); set have; array v_in(4) $ code1-code4; array v_sort(4) $255. _temporary_; do i=1 to 4; if not missing(v_in(i)) then do; firstnum=anydigit(v_in(i)); if firstnum gt 1 then v_sort(i)=catt(substr(v_in(i),1,firstnum-1), put(input(substr(v_in(i),firstnum),8.),z8.)); else v_sort(i)= put(input(substr(v_in(i),1),8.),z8.); end; end; %Qsort (Arr=v_sort v_in, By=v_sort, Seq=D); i=1; do while(i le dim(v_in)); if not missing(v_in(i)) then do; if i eq 1 then largest=v_in(i); else smallest=v_in(i); end; i+1; end; run;
Art, CEO, AnalystFinder.com
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.