Hello SAS Community,
I have a dataset with close to 1 million rows. I want to find if the value 'Yes' is in sequence across the columns like below. It should be all 'Yes' without any other value in between or no missing. Can you please help? I tried array, but it only flags if a specific value like 'Yes' is found.
Have:
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Yes No Yes Yes Yes No Yes . . .
Yes Yes Yes Yes Yes . . . . .
Yes Yes Yes . . . . . . .
No No No No No Yes No Yes Yes No
Want:
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Flag
Yes No Yes Yes Yes No Yes . . . 0
Yes Yes Yes Yes Yes . . . . . 1
Yes Yes Yes . . . . . . . 1
No No No No No Yes No Yes Yes No 0
data want;
set have;
array t{*} t1-t10;
flag = 1;
do i = 1 to dim(t);
if t{i} not in ('Yes',``) then flag = 0;
end;
drop i;
run;
Untested, posted from my tablet.
data want;
set have;
array t{*} t1-t10;
flag = 1;
do i = 1 to dim(t);
if t{i} not in ('Yes',``) then flag = 0;
end;
drop i;
run;
Untested, posted from my tablet.
Thank you, Kurt! This works.
First let's make some better test data that has more of conditions you mentioned.
data have;
input (T1-T10) (:$3.);
cards;
Yes No Yes Yes Yes No Yes . . .
Yes Yes Yes Yes Yes . . . . .
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Yes Yes . Yes . . . . . .
Yes Yes Yes . . . . . . .
No No No No No Yes No Yes Yes No
. . . . . . . . . .
;
So just scan the list until you hit one that is not Yes.
data want;
set have ;
array t [10];
flag=cmiss(of t[*]) < dim(t);
do index=1 to dim(T)-cmiss(of t{*]) while(flag=1);
flag=(t[index] = 'Yes');
end;
drop index;
run;
Results
Thank you for your time, Tom! Your solution works as well.
Hello @SP01,
Assuming that variables T1-T10 have length 3 and that a missing value of T1 would violate your criterion, this definition of FLAG should work:
data want;
set have;
flag=cat(of t1 t:)=repeat('Yes',10-cmiss(of t:));
run;
@FreelanceReinh wrote:
Hello @SP01,
Assuming that variables T1-T10 have length 3 and that a missing value of T1 would violate your criterion, this definition of FLAG should work:
data want; set have; flag=cat(of t1 t:)=repeat('Yes',10-cmiss(of t:)); run;
That is simple, but remember that REPEAT() wants the count of the EXTRA copies to append so subtract 1.
Note that it does not handle embedded missing.
1131 data want; 1132 set have ; 1133 array t [10]; 1134 flag=cmiss(of t[*]) < dim(t); 1135 do index=1 to dim(T)-cmiss(of t{*]) while(flag=1); 1136 flag=(t[index] = 'Yes'); 1137 end; 1138 drop index; 1139 flag2 = (cats(of t[*])=repeat('Yes',dim(t)-cmiss(of t[*])-1)); 1140 run; NOTE: Argument 2 to function REPEAT('Yes',-1) at line 1139 column 26 is invalid. T1= T2= T3= T4= T5= T6= T7= T8= T9= T10= flag=0 index=1 flag2=1 _ERROR_=1 _N_=7 NOTE: There were 7 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 7 observations and 12 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
You could fix the all missing case by adding a MAX() function so that the count cannot be negative.
flag2 = (cats(of t[*])=repeat('Yes',max(0,dim(t)-cmiss(of t[*])-1)));
Hello @Tom,
Thanks for checking my solution. I still think it is correct and I've just confirmed that with a HAVE dataset comprising all 3**10 possible combinations of 'Yes', 'No' and ' '.
The peculiarity of the REPEAT function that you mention is the reason why I use cat(of t1 t:) rather than cat(of t:). This works in particular for the "all missing" case. There's also no problem with "embedded" missings because CAT (unlike CATS, which is what you are using for your FLAG2) does not compress them.
I see, looks good.
Note that it depends on the lengths of all of the variables matching the length of the string passed to the REPEAT() function.
@Tom wrote:
Note that it depends on the lengths of all of the variables matching the length of the string passed to the REPEAT() function.
This is a good point as it indicates how the solution could be adapted to variables T1-T10 having a common defined length >3 or to the similar task aiming at a series of consecutive 'No' values: The string passed to the REPEAT() function would just need to contain the trailing blanks so that the LENGTHC() of the string would match that of the T1-T10.
Thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.