BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SP01
Obsidian | Level 7

Hello all,

 

I tried different ways and searched on google to come up with a way to deal with this. But was unable to accomplish this.

 

Currently, my data looks like the below:

data Have;
input col1 $ col2 $ col3 $ col4 $ col5 $ col6 $ col7 $ col8 $ col9 $ col10 $;
cards;
A B B C . D A . A . 
B A . C A D A B . . 
A B B D D D . . . . 
A . B C B D A . . . 
A B B C . D A . . . 
;

Goal: I want to flag all rows that have a missing column in between. For example, if there is a column 2 missing, where col1 - col8 is the series here. Important to remember that there might be instances where the sequence finishes at col5 or col4 or col6. The rest of the columns would be missing values. We need to ignore these and only flag if there is a missing column or columns in between non-missing columns.

 

data Have;
    input col1 $ col2 $ col3 $ col4 $ col5 $ col6 $ col7 $ col8 $ col9 $ col10 $ flag$;
    cards;
                         flag
    A B B C . D A . A .   1
    B A . C A D A B . .   1
    A B B D D D . . . .   0
    A . B C B D A . . .   1
    A B B C . D A . . .   1
    ;

Thank you for your time.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I would be interested in seeing just what you tried.

Typically dealing with sequential processing of values on a single observation points towards an array.

 

data want;
   set have;
   array c(*) col1-col10;
   flag=0;
   do i= 1 to 8;
      if not missing(c[i]) and missing (c[i+1]) and not missing(c[i+2]) then do;
         flag=1;
         leave;
      end;
   end;
   drop i;
run;

The index values only go from 1 to 8 because we are examining 3 variables at a time.

The Leave instruction tells SAS to quit processing the loop when a case of missing is found between two non-missing values after setting the flag value.

View solution in original post

9 REPLIES 9
ballardw
Super User

I would be interested in seeing just what you tried.

Typically dealing with sequential processing of values on a single observation points towards an array.

 

data want;
   set have;
   array c(*) col1-col10;
   flag=0;
   do i= 1 to 8;
      if not missing(c[i]) and missing (c[i+1]) and not missing(c[i+2]) then do;
         flag=1;
         leave;
      end;
   end;
   drop i;
run;

The index values only go from 1 to 8 because we are examining 3 variables at a time.

The Leave instruction tells SAS to quit processing the loop when a case of missing is found between two non-missing values after setting the flag value.

SP01
Obsidian | Level 7

Thank you @ballardw 

I will try this code and see if it also flags the edge cases like @Tom  described in the comments.

ballardw
Super User

It won't because your stated requirement was " that have a missing column in between.".

"A missing column" means one. If you wanted more than one to be counted the phrase could be "one or more missing values between".

And won't catch the edge of start with missing because "in between". Problem should state "in between or first is missing with second present". This one is easy to check as a special case for the I=1 in the do loop I suggested.

 

If the data had included such examples as possible ...

data expect;
  input (col1-col10) ($) flag ;
cards;
A B B C . D A . A . 1
B A . C A D A B . . 1
A B B D D D . . . . 0
A . B C B D A . . . 1
A B B C . D A . . . 1
. . C D E F . . . . 1
A B . . E F . . . . 1
;

data want;
   set expect;
   array c (*) col1-col10;
   array n (10);
   do i=1 to dim(c);
     n[i]= i*( not missing(c[i]));
   end;
   flag = 0< whichn(0, of n(*)) < whichn(max(of n(*)),of n(*));
   drop i n1-n10;
run;

 

The above creates indicator variables with values of 1 to 10 for present and 0 for missing in position 1 through 10 of the array.

 

The WHICHN function finds the position of the value in the first parameter in a list of following values. When multiple matches may occur the first position in the ordered list is reported.

So we test to see if any of the 0, if present occur and get the position. Then find the maximum value of the array and find its position. If that last number is greater than the first of the 0 then you have a missing in the "middle" or beginning.

 

SP01
Obsidian | Level 7
data want;
set have;

array _c(*) c1 - c10;

do i = 1 to dim(_c);
if missing(__c[i]) and not missing(lag(_c[i])) then _flag = 1;
end;
run;

@ballardw I tried the above array with lag function, but was not able to view the desired outcome.

Tom
Super User Tom
Super User

You are missing some edge cases. 

What if the gap is at the start? 

What if the gap is more than one variable?

data expect;
  input (col1-col10) ($) flag ;
cards;
A B B C . D A . A . 1
B A . C A D A B . . 1
A B B D D D . . . . 0
A . B C B D A . . . 1
A B B C . D A . . . 1
. . C D E F . . . . ?
A B . . E F . . . . ?
;
SP01
Obsidian | Level 7

@Tom  If the gap is at the start and there is no gap after the start, we ignore this.

 

If the gap is more than one variable, we still need to flag it.

ballardw
Super User

@SP01 wrote:

@Tom  If the gap is at the start and there is no gap after the start, we ignore this.

 

If the gap is more than one variable, we still need to flag it.


Not clear if that means the the next to last row of @Tom's example is a one or not.

Tom
Super User Tom
Super User

So use the logic you got on Stackoverflow, but skip the leading missing values.

Here is one way:

data expect;
  input (col1-col10) ($) flag ;
cards;
A B B C . D A . A . 1
B A . C A D A B . . 1
A B B D D D . . . . 0
A . B C B D A . . . 1
A B B C . D A . . . 1
. . C D E F . . . . 0
A B . . E F . . . . 1
;

data want ;
  set expect(drop=flag);
  array col col1-col10;
  index=whichc(coalescec(of col[*]),of col[*]);
  do index=index+1 to dim(col)-1 until (flag);
    flag=missing(col[index]) and not missing(col[index+1]);
  end;
run;

Result

Tom_0-1679438077407.png

 

Ksharp
Super User
data expect;
  input (col1-col10) ($)  ;
cards;
A B B C . D A . A . 
B A . C A D A B . . 
A B B D D D . . . . 
A . B C B D A . . . 
A B B C . D A . . . 
. . C D E F . . . . 
A B . . E F . . . . 
;

data want ;
  set expect;
  length temp $ 200;
  array x{*} $ col: ;
  do i=1 to dim(x);
    temp=cats(temp,coalescec(x{i},'.'));
  end;
  flag=ifn(prxmatch('/[^\.]+\.+[^\.]+/',strip(temp)),1,0);
  drop temp i;
run;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 790 views
  • 3 likes
  • 4 in conversation