/*Or COUNTW is Simpler*/
data want;
set have;
array v var:;
_k=coalescec(of v(*));
_n=dim(v)-cmiss(of v(*));
_k1=catx(' ',of v(*));
Flag=_n=countw(_k1,strip(_k));
drop _:;
run;
data WANT;
set have;
array v var:;
flag=1;
prev_non_missing=v(1);
do i=2 to dim(v);
if not missing(v(i)) and not missing(prev_non_Missing) and v(i)^=v(i-1) then do;
flag=0;
leave;
end;
else if not missing(prev_non_missing) and missing(v(i)) then v(i)=prev_non_missing;
else if missing(prev_non_missing) and not missing(v(i)) then prev_non_missing=v(i);
end;
drop i;
run;
You can just take the first non-missing value and then test to see if any other non-missing values exist.
data have;
infile datalines truncover;
input id $ var1 $ var2 $ var3 $;
datalines;
1 . A A
2 A A A
3 A . B
4 . A B
5 A B C
6 A A .
7 . . .
;
data want ;
set have ;
array var var1-var3 ;
first_value=coalescec(of var[*]);
flag=0;
do _n_=1 to dim(var) while (not flag);
if not missing(var[_n_]) then flag = first_value ne var[_n_];
end;
run;
proc print;
run;
first_ Obs id var1 var2 var3 value flag 1 1 A A A 0 2 2 A A A A 0 3 3 A B A 1 4 4 A B A 1 5 5 A B C A 1 6 6 A A A 0 7 7 0
data want;
set have;
call sortc(of var:);
flag=ifn(length(prxchange('s/(.)\1+/$1/', -1, cats(of var:))) > 1, 1, 0);
run;
Note that using CALL SORTC() will change the order of the original values.
Also I don't think that method works when the strings are longer than a single character.
Try setting the values to the same multiple character values and see what happens.
Also what value is the use of the IFN() doing in that code?
It looks like you are using it to convert the results of the test for greater than one. So you are using it to convert 1 to 1 and 0 to 0.
Ah yes. Once again thank your for the sharp eye @Tom.
The code should rather be
data want;
set have;
call sortc(of var:);
flag=length(prxchange('s/(.)\1+/$1/', -1, cats(of var:))) > 1;
set have;
run;
This way the order of var1-var3 will be intact in want and the flag is set correctly. Don't need the ifn function.
Of course, this approach will not work if the variables are not single letters no 🙂
It is easily extendable to handle more than one-character strings though
data have;
infile datalines truncover;
input id $ var1 $ var2 $ var3 $;
datalines;
1 . AB AB
2 AB AB AB
3 AB . BA
4 . AB BA
5 AB BA CD
6 AB AB .
7 . . .
;
data want;
set have;
call sortc(of var:);
flag=countw(prxchange('s/\b(\w+)(?:\W+\1\b)+/$1/', -1, catx(' ', of var:)), ' ') > 1;
set have;
run;
data have;
infile datalines truncover;
input id $ var1 $ var2 $ var3 $;
datalines;
1 . A A
2 A A A
3 A . B
4 . A B
5 A B C
6 A A .
;
run;
data want;
set have;
array x {3} var1 var2 var3 ;
do i= 1 to 1;
if i ne . then do;
if x{i} ne x{i+1} ne x{i+2} then output;
end;end;
run;
you might modify it as per the amount of variables you have to check .
@supp wrote:
Hello SAS community. I am hopeful someone might have a better method for comparing three character variables within the same observations such that it identifies when two or more of the variables are different. Also, a null value in one of the variables should not be seen as different.
hypothetical data:
data have; infile datalines truncover; input id $ var1 $ var2 $ var3 $; datalines; 1 . A A 2 A A A 3 A . B 4 . A B 5 A B C 6 A A . ;
Data want would return ID's 3, 4 and 5 (there is at least one difference between the non missing values). It would not return ID's 1,2 or 6 (all non missing values are the same).
I tried a series of nested if-then do statements. For example:
data want; set have; if var1 ne '.' then do; if var2 ne '.' then do; if var1 ne var2; end; end; run;
I might have gotten a variation (not exactly like the example above) to maybe work. Given the volume of data it is hard to be confident in my solution. I believe there is a much more elegant solution to this problem, any suggestions would be appreciated.
You are missing a couple pieces of potentially extremely critical information. Such as, are your character values single characters or longer? And are they something from a fixed list of values (you know what they are expected to be) or just sort of randomly entered text?
Also is this case sensitive? Is "A" equal to "a" for this purpose?
If you have values that are fixed and case sensitive then this approach of creating a custom informat to assign a numeric value will work as what you are in effect requesting is "is the range of values = 0 or not". The following code could be adjusted to case insensitive by adding (UPCASE) after the name of the informat in the Invalue statement of the format. The variable dif will be 1 when there is a difference and 0 otherwise. The numeric values assigned to the values aren't critical as long as each is different though if you have a largish number of possible values a sequential assignment might be easier with a CNTLIN data set for Proc Format.
proc format library=work; invalue mychars 'A' = 1 'B' = 2 'C' = 3 ; run; data have; infile datalines truncover; input id $ var1 $ var2 $ var3 $; array vc var1 - var3; array vn{3}; do i=1 to dim(vc); vn[i]= input(vc[i],mychars.); end;
drop I vn: ; dif = (range(of vn(*)) > 0); datalines; 1 . A A 2 A A A 3 A . B 4 . A B 5 A B C 6 A A . ;
Note that if you have larger number of variables with the same value list this is easily extendable to more variables. Just increase the array definitions.
data have; infile datalines truncover; input id $ var1 $ var2 $ var3 $; datalines; 1 . A A 2 A A A 3 A . B 4 . A B 5 A B C 6 A A . 7 . . . ; data want; if _n_=1 then do; length k $ 80; declare hash h(); h.definekey('k'); h.definedone(); end; set have; array x{*} $ var: ; do i=1 to dim(x); if not missing(x{i}) then do;k=x{i};h.ref(); end; end; if h.num_items>1 ; h.clear(); drop i k; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.