Hello SAS Community,
I have a dataset with close to 1 million rows. I want to find if the value 'Yes' is in sequence across the columns like below. It should be all 'Yes' without any other value in between or no missing. Can you please help? I tried array, but it only flags if a specific value like 'Yes' is found.
Have:
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Yes No Yes Yes Yes No Yes . . .
Yes Yes Yes Yes Yes . . . . .
Yes Yes Yes . . . . . . .
No No No No No Yes No Yes Yes No
Want:
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Flag
Yes No Yes Yes Yes No Yes . . . 0
Yes Yes Yes Yes Yes . . . . . 1
Yes Yes Yes . . . . . . . 1
No No No No No Yes No Yes Yes No 0
data want;
set have;
array t{*} t1-t10;
flag = 1;
do i = 1 to dim(t);
  if t{i} not in ('Yes',``) then flag = 0;
end;
drop i;
run;Untested, posted from my tablet.
data want;
set have;
array t{*} t1-t10;
flag = 1;
do i = 1 to dim(t);
  if t{i} not in ('Yes',``) then flag = 0;
end;
drop i;
run;Untested, posted from my tablet.
Thank you, Kurt! This works.
First let's make some better test data that has more of conditions you mentioned.
data have;
  input (T1-T10) (:$3.);
cards;
Yes No  Yes Yes Yes No  Yes .   .   .
Yes Yes Yes Yes Yes .   .   .   .   .
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Yes Yes .   Yes .   .   .   .   .   .
Yes Yes Yes .   .   .   .   .   .   .
No  No  No  No  No  Yes No  Yes Yes No
.   .   .   .   .   .   .   .   .   .
;So just scan the list until you hit one that is not Yes.
data want;
  set have ;
  array t [10];
  flag=cmiss(of t[*]) < dim(t);
  do index=1 to dim(T)-cmiss(of t{*]) while(flag=1);
    flag=(t[index] = 'Yes');
  end;
  drop index;
run;Results
Thank you for your time, Tom! Your solution works as well.
Hello @SP01,
Assuming that variables T1-T10 have length 3 and that a missing value of T1 would violate your criterion, this definition of FLAG should work:
data want;
set have;
flag=cat(of t1 t:)=repeat('Yes',10-cmiss(of t:));
run;
@FreelanceReinh wrote:
Hello @SP01,
Assuming that variables T1-T10 have length 3 and that a missing value of T1 would violate your criterion, this definition of FLAG should work:
data want; set have; flag=cat(of t1 t:)=repeat('Yes',10-cmiss(of t:)); run;
That is simple, but remember that REPEAT() wants the count of the EXTRA copies to append so subtract 1.
Note that it does not handle embedded missing.
1131  data want;
1132    set have ;
1133    array t [10];
1134    flag=cmiss(of t[*]) < dim(t);
1135    do index=1 to dim(T)-cmiss(of t{*]) while(flag=1);
1136      flag=(t[index] = 'Yes');
1137    end;
1138    drop index;
1139    flag2 = (cats(of t[*])=repeat('Yes',dim(t)-cmiss(of t[*])-1));
1140  run;
NOTE: Argument 2 to function REPEAT('Yes',-1) at line 1139 column 26 is invalid.
T1=  T2=  T3=  T4=  T5=  T6=  T7=  T8=  T9=  T10=  flag=0 index=1 flag2=1 _ERROR_=1 _N_=7
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 12 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
You could fix the all missing case by adding a MAX() function so that the count cannot be negative.
  flag2 = (cats(of t[*])=repeat('Yes',max(0,dim(t)-cmiss(of t[*])-1)));
Hello @Tom,
Thanks for checking my solution. I still think it is correct and I've just confirmed that with a HAVE dataset comprising all 3**10 possible combinations of 'Yes', 'No' and ' '.
The peculiarity of the REPEAT function that you mention is the reason why I use cat(of t1 t:) rather than cat(of t:). This works in particular for the "all missing" case. There's also no problem with "embedded" missings because CAT (unlike CATS, which is what you are using for your FLAG2) does not compress them.
I see, looks good.
Note that it depends on the lengths of all of the variables matching the length of the string passed to the REPEAT() function.
@Tom wrote:
Note that it depends on the lengths of all of the variables matching the length of the string passed to the REPEAT() function.
This is a good point as it indicates how the solution could be adapted to variables T1-T10 having a common defined length >3 or to the similar task aiming at a series of consecutive 'No' values: The string passed to the REPEAT() function would just need to contain the trailing blanks so that the LENGTHC() of the string would match that of the T1-T10.
Thank you!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
