Could some one should help me about this.
data dt1;
input a1 a2 a3;
cards;
1 2 3
4 4 4
;
run;
output:
a1 a2 a3 flag count
1 2 3 yes 2
4 4 4 no 0
if the value is changed across observation then flag yes else no, if there is a change how many times its changed(count)
data want;
set dt1;
array vars {*} a1-a3;
count = 0;
flag = 'no';
do i = 2 to dim(vars);
if vars{i} ne vars{i-1}
then do;
count = count + 1;
flag = 'yes';
end;
end;
drop i;
run;
data want;
set dt1;
array vars {*} a1-a3;
count = 0;
flag = 'no';
do i = 2 to dim(vars);
if vars{i} ne vars{i-1}
then do;
count = count + 1;
flag = 'yes';
end;
end;
drop i;
run;
This is an interesting question and to be honest I was a little surprised to realise their wasn't an inbuilt function to give us the number of distinct values across a range of variables.
There's only one change I'd suggest to Kurt's answer - if you don't add the length statement then the value of the flag variable is truncated for the second observation so:
data want;
set dt1;
length flag $3;
array vars {*} a1-a3;
count = 0;
flag = 'no';
do i = 2 to dim(vars);
if vars{i} ne vars{i-1}
then do;
count = count + 1;
flag = 'yes';
end;
end;
drop i;
run;
The other thing that occcured to me was how would you want missing values treated?
Yep, good catch. Either use a length statement or
flag = 'no ';
to avoid truncation.
data dt1;
input a1-a5;
cards;
1 2 3 4 5 4
4 4 4 2 3 4
4 4 4 3 4 4
4 4.1 4.2 4 4 4
4 4 4 4 4
;
run;
data want1;
set dt1;
array num _numeric_;
count=0;
if std(of num(*))=0 then flag='N';
else do;
flag='Y';
call sortn(of num(*));
do _n_=1 to dim(num)-1;
if std(num(_n_),num(_n_+1))= 0 then continue;
else count+1;
end;
end;
run;
/*Without callsortn- slightly different count*/
data want1;
set dt1;
array num _numeric_;
count=0;
if std(of num(*))=0 then flag='N';
else do;
flag='Y';
do _n_=1 to dim(num)-1;
if std(num(_n_),num(_n_+1))= 0 then continue;
else count+1;
end;
end;
run;
@ChrisBrooks wrote:
This is an interesting question and to be honest I was a little surprised to realise their wasn't an inbuilt function to give us the number of distinct values across a range of variables.
I guess this is because one would usually solve such problems by transposing into a long format (or keeping the data in such a format from the start) and using proc means/summary/freq or SQL summary functions, or (in this case) find differences using lag().
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.