Help using Base SAS procedures

find the # of observations missing before & after

Accepted Solution Solved
Reply
Super Contributor
Posts: 401
Accepted Solution

find the # of observations missing before & after

How could I find how many observations are missing at the front and end, By group.. So for example:

HAVE

Store   Var1   Var2   Var3

A          .          4       .

A          .          5       .

A          4         .        4

A          6         .        5

A          .          .        6

B          4         .        9

B          5         4       .

B          .          6       8

It would tell me that

Store A - Var1 has 2 obs missing at the front and 1 at the end

Store B - Var1 has 1 at end

Store A - Var2 has 3 at the end

Store B - Var2 has 1 at the front

Store A - Var3 has 2 at the front

Store B - Var3 has NONE

Even better if there is a way I can identify them through an indicator or something, that way I can delete thenm if I chose.

thank you..


Accepted Solutions
Solution
‎03-28-2012 10:46 AM
Super User
Posts: 5,504

Re: find the # of observations missing before & after

OK, here's a variation based on my original program, and switching over to borrow a little bit of logic from the PG Stats solution.  As before, it addresses var1 only:

data want;

   var1_beginning=0;

   var1_ending=0;

   n_store=0;

   do until (last.store);

       set have;

       by store;

       if var1=. then do;

          var1_ending + 1;

          if var1_beginning = n_store then var1_beginning + 1;

       end;

       else var1_ending = 0;

       n_store + 1;

   end;

   n_store_so_far=0;

   do until (last.store);

       set have;

       by store;

       n_store_so_far + 1;

       if n_store_so_far <= var1_beginning then var1_flag='S';

       else if (n_store - var1_ending )  < n_store_so_var then var1_flag='E';

       else var1_flag=' ';

       output;

   end;

   * As you decide:  drop var1_beginning var1_ending n_store n_store_so_far;

run;

It's untested, so I've got fingers crossed here.  Let me know if you encounter any problems with it.

View solution in original post


All Replies
Super User
Posts: 5,504

Re: find the # of observations missing before & after

podarum,

Counting is mildly complex, assuming your data are in order by STORE.  Here's code for VAR1 only (untested code):

data want (keep=store var1_beginning var1_ending);

   var1_beginning=0;

   var1_ending=0;

   var1_flag='Y';

   do until (last.store);

       set have;

       by store;

       if var1=. then do;

          var1_ending + 1;

          if var1_flag='Y' then var1_beginning + 1;

       end;

       else do;

          var1_ending=0;

          var1_flag='N';

       end;

   end;

run;

Setting up an indicator?  You would certainly need 3 indicators, one for each variable, and many more observations in the output data set.  Let's see if this part makes sense first.

Good luck.

Super Contributor
Posts: 401

find the # of observations missing before & after

Posted in reply to Astounding

Worked fine.. thanks.. now I have to get a flag for the missing

Super User
Posts: 5,504

find the # of observations missing before & after

podarum,

So far so good, but you'll have to do more of the work than that.  How many flags would you like for each variable?  What values would you like them to take on?  Please illustrate.

Super Contributor
Posts: 401

find the # of observations missing before & after

Posted in reply to Astounding

For example have a field called MISSING, and if the value is missing at the start then have an 'S' in that field, and if the missing is at the end then have an 'E" in that cell, and nothing if the value is not missing or if it's missing in between.

Solution
‎03-28-2012 10:46 AM
Super User
Posts: 5,504

Re: find the # of observations missing before & after

OK, here's a variation based on my original program, and switching over to borrow a little bit of logic from the PG Stats solution.  As before, it addresses var1 only:

data want;

   var1_beginning=0;

   var1_ending=0;

   n_store=0;

   do until (last.store);

       set have;

       by store;

       if var1=. then do;

          var1_ending + 1;

          if var1_beginning = n_store then var1_beginning + 1;

       end;

       else var1_ending = 0;

       n_store + 1;

   end;

   n_store_so_far=0;

   do until (last.store);

       set have;

       by store;

       n_store_so_far + 1;

       if n_store_so_far <= var1_beginning then var1_flag='S';

       else if (n_store - var1_ending )  < n_store_so_var then var1_flag='E';

       else var1_flag=' ';

       output;

   end;

   * As you decide:  drop var1_beginning var1_ending n_store n_store_so_far;

run;

It's untested, so I've got fingers crossed here.  Let me know if you encounter any problems with it.

Super Contributor
Posts: 401

Re: find the # of observations missing before & after

Posted in reply to Astounding

What can I say.. Brilliant.. not only did it work perfectly, it was on the first shot.. great stuff, Thanks

Super Contributor
Posts: 401

Re: find the # of observations missing before & after

Posted in reply to Astounding

Hi Astounding,  This information is invaluable, but since I'm terrible with proc sql, I'm hoping you can help me with an add extra step.  Is there a way I can do the same thing we have with var1, but within the same step, add a var2 and var3 ??  So have an 'S2' and and 'E2' for var2 , and 'S3' and 'E3' for var3, etc.   Thanks..

Super User
Posts: 5,504

find the # of observations missing before & after

podarum,

Though some may disagree, I believe one of the primary purposes of this facility is to teach you to fish, rather than to present you with a basket of fish.  I think this is something you can do yourself, if you would try.  Just in case you need a hint or two, I'll spell out a little more below.

There's no SQL in my solution, so you don't need to know any of that.  The existing variables N_STORE and N_STORE_SO_FAR can remain as they are.  No matter how many variables you want to add, the number of observations per STORE does not change.  For each variable that you add, you need to compute 3 more variables.  For example, to add a flag for VAR2, you would need to add VAR2_BEGINNING, VAR2_ENDING, and VAR2_FLAG.  It should be simple enough to repeat the logic that you see for VAR1.  If you really try and can't do it, post your progress and we can tweak it.

Good luck.

Super Contributor
Posts: 401

Re: find the # of observations missing before & after

Posted in reply to Astounding

Thank you Astounding.  I have tried it and tried, but as you can see I'm not that solid as I'm still learning the more complex procedures and processes.  I will give it a try, and post.

Respected Advisor
Posts: 4,920

find the # of observations missing before & after

This will do it...

data  test;
input Store $   Var1   Var2   Var3;
datalines;
A          .          4       .
A          .          5       .
A          4         .        4
A          6         .        5
A          .          .        6
B          4         .        9
B          5         4       .
B          .          6       8
;

data cnt(keep=store start1-start3 finish1-finish3);
array v(3) var1-var3;
array start(3) start1-start3;
array finish(3) finish1-finish3;
retain n start finish;
set test;
by store;
if first.store then do;
n = 0;
do i = 1 to 3; start(i)=0; finish(i)=0; end;
end;
do i = 1 to 3;
if missing(v(i)) then do;
  finish(i) + 1;
  start(i) + ( start(i) = n );
end;
else finish(i) = 0;
end;
n + 1;
if last.store then output;
run;

proc print; run;

PG

PG
Super User
Posts: 5,504

find the # of observations missing before & after

PGStats,

I like the way you use N instead of my clumsier VAR1_FLAG.  Note, however, that there is a method to my madness.  If the solution were expanded to address some of the other questions, such as setting flags or deleting records, the subsequent programming would be different.  I'm going to imagine that I created N like you did, instead of creating VAR1_FLAG.  In that case, you would merge your result with the original data, while I would add another DO loop at the bottom of the same DATA step:

do until (last.store);

    set have;

    by store;

    * Use previously calculated variables to assign flags, and/or output proper records;

end;

To each his own!

Respected Advisor
Posts: 4,920

find the # of observations missing before & after

Posted in reply to Astounding

Astounding, I'm flattered that you took the time to figure out my algorithm. I hadn't thought of putting the set inside a do until(last.) loop, it's a good idea which I hope to reuse one day. I agree with you that the best way to solve a problem is not always the most "clever". Readable, elegant and generalisable solutions are what I strive for in my own work. But when I work on others' problems, with minimal understanding of their context, I code as if the exercise was purely academic (and it often is, I'm afraid).

PG

PG
Super User
Posts: 10,028

find the # of observations missing before & after

What output dataset do you need ?

data  test;
input Store $   Var1   Var2   Var3;
datalines;
A          .          4       .
A          .          5       .
A          4         .        4
A          6         .        5
A          .          .        6
B          4         .        9
B          5         4       .
B          .          6       8
;
run;
data test;
 set test;
 by store;
 if first.store then do;flag1=0;flag2=0;flag3=0;end;
 if var1 ne lag(var1) then flag1+1;
 if var2 ne lag(var2) then flag2+1;
 if var3 ne lag(var3) then flag3+1;
run;
proc sql;
 create table want as
  select 'var1' as var,store,flag1 as flag,count(*) as count
   from test
    where var1=.
    group by store,flag1
     having flag1=min(flag1) or flag1=max(flag1) 
  union all
  select 'var2',store,flag2 as flag,count(*) as count
   from test
    where var2=.
    group by store,flag2
     having flag2=min(flag2) or flag2=max(flag2) 
  union all
  select 'var3',store,flag3 as flag,count(*) as count
   from test
    where var3=.
    group by store,flag3
     having flag3=min(flag3) or flag3=max(flag3) 
;

quit;

Ksharp

Super Contributor
Posts: 401

find the # of observations missing before & after

This is great stuff .. thanks

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 569 views
  • 0 likes
  • 4 in conversation