Could some one should help me about this.
data dt1;
input a1 a2 a3;
cards;
1 2 3
4 4 4
;
run;
output:
a1 a2 a3 flag count
1 2 3 yes 2
4 4 4 no 0
if the value is changed across observation then flag yes else no, if there is a change how many times its changed(count)
data want;
set dt1;
array vars {*} a1-a3;
count = 0;
flag = 'no';
do i = 2 to dim(vars);
if vars{i} ne vars{i-1}
then do;
count = count + 1;
flag = 'yes';
end;
end;
drop i;
run;
data want;
set dt1;
array vars {*} a1-a3;
count = 0;
flag = 'no';
do i = 2 to dim(vars);
if vars{i} ne vars{i-1}
then do;
count = count + 1;
flag = 'yes';
end;
end;
drop i;
run;
This is an interesting question and to be honest I was a little surprised to realise their wasn't an inbuilt function to give us the number of distinct values across a range of variables.
There's only one change I'd suggest to Kurt's answer - if you don't add the length statement then the value of the flag variable is truncated for the second observation so:
data want;
set dt1;
length flag $3;
array vars {*} a1-a3;
count = 0;
flag = 'no';
do i = 2 to dim(vars);
if vars{i} ne vars{i-1}
then do;
count = count + 1;
flag = 'yes';
end;
end;
drop i;
run;
The other thing that occcured to me was how would you want missing values treated?
Yep, good catch. Either use a length statement or
flag = 'no ';
to avoid truncation.
data dt1;
input a1-a5;
cards;
1 2 3 4 5 4
4 4 4 2 3 4
4 4 4 3 4 4
4 4.1 4.2 4 4 4
4 4 4 4 4
;
run;
data want1;
set dt1;
array num _numeric_;
count=0;
if std(of num(*))=0 then flag='N';
else do;
flag='Y';
call sortn(of num(*));
do _n_=1 to dim(num)-1;
if std(num(_n_),num(_n_+1))= 0 then continue;
else count+1;
end;
end;
run;
/*Without callsortn- slightly different count*/
data want1;
set dt1;
array num _numeric_;
count=0;
if std(of num(*))=0 then flag='N';
else do;
flag='Y';
do _n_=1 to dim(num)-1;
if std(num(_n_),num(_n_+1))= 0 then continue;
else count+1;
end;
end;
run;
@ChrisBrooks wrote:
This is an interesting question and to be honest I was a little surprised to realise their wasn't an inbuilt function to give us the number of distinct values across a range of variables.
I guess this is because one would usually solve such problems by transposing into a long format (or keeping the data in such a format from the start) and using proc means/summary/freq or SQL summary functions, or (in this case) find differences using lag().
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.