Hi All,
I would like to count the maximum number of value columns with the same value across any number of columns in the same row. For a small number of variables, this can be done with a bunch of conditional statements. However, this approach quickly falls apart as the number of variables increases. Is there a more elegant, dynamic approach? To illustrate:
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want;
set have;
if col1 = col2 = col3 = col4 then count = 4;
else if col1 = col2 = col3 then count = 3;
else if col1 = col2 = col4 then count = 3;
else if col1 = col3 = col4 then count = 3;
else if col2 = col3 = col4 then count = 3;
else if col2 = col3 = col4 then count = 3;
else if col1 = col2 then count = 2;
else if col1 = col3 then count = 2;
else if col1 = col4 then count = 2;
else if col2 = col3 then count = 2;
else if col2 = col4 then count = 2;
else if col3 = col4 then count = 2;
else count = 1;
run;
I found this code from Messr. Tabachneck to be close but it counts the number of matching pairs rather than the number of columns with the same value.
data want (drop=i j);
set have;
array vars{*} col1-col4;
count=0;
do i=1 to dim(vars)-1;
do j=i+1 to dim(vars);
count=sum(count, vars(i) eq vars(j));
end;
end;
run;
Best,
kbk
Another brute force approach using Array():
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want;
set have;
array col col1-col4;
length _cat $100.;
_cat=catx(',',of col(*));
do over col;
do _i=1 to dim(col);
_ct+(col=scan(_cat,_i));
end;
count=max(count,_ct);
_ct=0;
end;
drop _:;
run;
Haikuo
What happens when the following is true:
else if col1 = col2 then count = 2;
else if col3 = col4 then count = 2;
Except if you have more than 4 columns?
Reeza, I am assuming you are asking if the column values consist of [a a b b]. In this case the number needed would be 2. Column values of [a a a b b b b] would yield a count of 4, due to the fact of having 4 b's.
Why not transpose the data, run a proc freq, and take the top one?
Reeza, thanks. I have transposed the dataset but am not sure the best way to "run a proc freq and take the top one."
PROC TRANSPOSE DATA=WORK.have OUT=WORK.have_tran PREFIX=Column;
VAR col1-col4;
run;
PROC FREQ DATA=WORK.have_tran;
TABLES Column1 / OUT=WORK.Col1WANT;
TABLES Column2 / OUT=WORK.Col2WANT;
TABLES Column3 / OUT=WORK.Col3WANT;
TABLES Column4 / OUT=WORK.Col4WANT;
TABLES Column5 / OUT=WORK.Col5WANT;
RUN;
You have some more efficient suggestions above, but the following is what I had in mind. More of a beginner method I suppose
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
*add in a row count;
data have_transposed;
set have;
row=_n_;
array col(4) $ col1-col4;
do i=1 to 4;
column=i;
value=col(i);
output;
end;
keep row column value;
run;
*count the results and order them descending;
proc freq data=have_transposed order=freq noprint;
by row;
table value/out=counts;
run;
*take the top one;
data want;
set counts;
by row;
if first.row;
run;
Use call routine sortc. Like this :
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want(keep=best bestCount);
set have;
array c{*} col:;
call sortc(of c{*});
last = c{1};
best = last; /* So that best as the same length as last and col1 */
do i = 1 to dim(c);
same = c{i} = last;
count = sum(count, same);
if count > bestCount then do;
best = coalescec(last, c{i});
bestCount = count;
end;
if not same then count = 0;
last = c{i};
end;
output;
run;
PG
I had a similar idea but I don't know how it works. :smileyshocked:
I used peek and poke to save the array and put it back.
Thanks for the help everyone!
Another brute force approach using Array():
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want;
set have;
array col col1-col4;
length _cat $100.;
_cat=catx(',',of col(*));
do over col;
do _i=1 to dim(col);
_ct+(col=scan(_cat,_i));
end;
count=max(count,_ct);
_ct=0;
end;
drop _:;
run;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.