BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SP01
Obsidian | Level 7

Hello SAS Community,

 

I have a dataset with close to 1 million rows. I want to find if the value 'Yes' is in sequence across the columns like below. It should be all 'Yes' without any other value in between or no missing. Can you please help? I tried array, but it only flags if a specific value like 'Yes' is found.

Have:

T1    T2   T3   T4  T5  T6  T7 T8 T9 T10

Yes  No   Yes Yes Yes No Yes  .   .     .

Yes  Yes Yes Yes Yes   .     .     .   .     .

Yes Yes Yes    .      .      .     .     .   .     .

No   No  No  No  No Yes No Yes Yes No

 

Want:

T1    T2   T3   T4  T5  T6  T7 T8 T9 T10      Flag 

Yes  No   Yes Yes Yes No Yes  .   .     .           0

Yes  Yes Yes Yes Yes   .     .     .   .     .           1

Yes Yes Yes    .      .      .     .     .   .     .           1

No   No  No  No  No Yes No Yes Yes No        0

1 ACCEPTED SOLUTION
10 REPLIES 10
Tom
Super User Tom
Super User

First let's make some better test data that has more of conditions you mentioned.

data have;
  input (T1-T10) (:$3.);
cards;
Yes No  Yes Yes Yes No  Yes .   .   .
Yes Yes Yes Yes Yes .   .   .   .   .
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Yes Yes .   Yes .   .   .   .   .   .
Yes Yes Yes .   .   .   .   .   .   .
No  No  No  No  No  Yes No  Yes Yes No
.   .   .   .   .   .   .   .   .   .
;

So just scan the list until you hit one that is not Yes.

data want;
  set have ;
  array t [10];
  flag=cmiss(of t[*]) < dim(t);
  do index=1 to dim(T)-cmiss(of t{*]) while(flag=1);
    flag=(t[index] = 'Yes');
  end;
  drop index;
run;

Results

Tom_0-1676045498351.png

 

SP01
Obsidian | Level 7

Thank you for your time, Tom! Your solution works as well. 

FreelanceReinh
Jade | Level 19

Hello @SP01,

 

Assuming that variables T1-T10 have length 3 and that a missing value of T1 would violate your criterion, this definition of FLAG should work:

data want;
set have;
flag=cat(of t1 t:)=repeat('Yes',10-cmiss(of t:));
run;

 

Tom
Super User Tom
Super User

@FreelanceReinh wrote:

Hello @SP01,

 

Assuming that variables T1-T10 have length 3 and that a missing value of T1 would violate your criterion, this definition of FLAG should work:

data want;
set have;
flag=cat(of t1 t:)=repeat('Yes',10-cmiss(of t:));
run;

 


That is simple, but remember that REPEAT() wants the count of the EXTRA copies to append so subtract 1.

Note that it does not handle embedded missing.

1131  data want;
1132    set have ;
1133    array t [10];
1134    flag=cmiss(of t[*]) < dim(t);
1135    do index=1 to dim(T)-cmiss(of t{*]) while(flag=1);
1136      flag=(t[index] = 'Yes');
1137    end;
1138    drop index;
1139    flag2 = (cats(of t[*])=repeat('Yes',dim(t)-cmiss(of t[*])-1));
1140  run;

NOTE: Argument 2 to function REPEAT('Yes',-1) at line 1139 column 26 is invalid.
T1=  T2=  T3=  T4=  T5=  T6=  T7=  T8=  T9=  T10=  flag=0 index=1 flag2=1 _ERROR_=1 _N_=7
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 12 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Tom_0-1676048412476.png

You could fix the all missing case by adding a MAX() function so that the count cannot be negative.

  flag2 = (cats(of t[*])=repeat('Yes',max(0,dim(t)-cmiss(of t[*])-1)));

 

FreelanceReinh
Jade | Level 19

Hello  @Tom,

 

Thanks for checking my solution. I still think it is correct and I've just confirmed that with a HAVE dataset comprising all 3**10 possible combinations of 'Yes', 'No' and ' '.

 

The peculiarity of the REPEAT function that you mention is the reason why I use cat(of t1 t:) rather than cat(of t:). This works in particular for the "all missing" case. There's also no problem with "embedded" missings because CAT (unlike CATS, which is what you are using for your FLAG2) does not compress them.

Tom
Super User Tom
Super User

I see, looks good.

Note that it depends on the lengths of all of the variables matching the length of the string passed to the REPEAT() function.

FreelanceReinh
Jade | Level 19

@Tom wrote:

Note that it depends on the lengths of all of the variables matching the length of the string passed to the REPEAT() function.


This is a good point as it indicates how the solution could be adapted to variables T1-T10 having a common defined length >3 or to the similar task aiming at a series of consecutive 'No' values: The string passed to the REPEAT() function would just need to contain the trailing blanks so that the LENGTHC() of the string would match that of the T1-T10.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1219 views
  • 3 likes
  • 4 in conversation