BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
podarum
Quartz | Level 8

How could I find how many observations are missing at the front and end, By group.. So for example:

HAVE

Store   Var1   Var2   Var3

A          .          4       .

A          .          5       .

A          4         .        4

A          6         .        5

A          .          .        6

B          4         .        9

B          5         4       .

B          .          6       8

It would tell me that

Store A - Var1 has 2 obs missing at the front and 1 at the end

Store B - Var1 has 1 at end

Store A - Var2 has 3 at the end

Store B - Var2 has 1 at the front

Store A - Var3 has 2 at the front

Store B - Var3 has NONE

Even better if there is a way I can identify them through an indicator or something, that way I can delete thenm if I chose.

thank you..

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, here's a variation based on my original program, and switching over to borrow a little bit of logic from the PG Stats solution.  As before, it addresses var1 only:

data want;

   var1_beginning=0;

   var1_ending=0;

   n_store=0;

   do until (last.store);

       set have;

       by store;

       if var1=. then do;

          var1_ending + 1;

          if var1_beginning = n_store then var1_beginning + 1;

       end;

       else var1_ending = 0;

       n_store + 1;

   end;

   n_store_so_far=0;

   do until (last.store);

       set have;

       by store;

       n_store_so_far + 1;

       if n_store_so_far <= var1_beginning then var1_flag='S';

       else if (n_store - var1_ending )  < n_store_so_var then var1_flag='E';

       else var1_flag=' ';

       output;

   end;

   * As you decide:  drop var1_beginning var1_ending n_store n_store_so_far;

run;

It's untested, so I've got fingers crossed here.  Let me know if you encounter any problems with it.

View solution in original post

14 REPLIES 14
Astounding
PROC Star

podarum,

Counting is mildly complex, assuming your data are in order by STORE.  Here's code for VAR1 only (untested code):

data want (keep=store var1_beginning var1_ending);

   var1_beginning=0;

   var1_ending=0;

   var1_flag='Y';

   do until (last.store);

       set have;

       by store;

       if var1=. then do;

          var1_ending + 1;

          if var1_flag='Y' then var1_beginning + 1;

       end;

       else do;

          var1_ending=0;

          var1_flag='N';

       end;

   end;

run;

Setting up an indicator?  You would certainly need 3 indicators, one for each variable, and many more observations in the output data set.  Let's see if this part makes sense first.

Good luck.

podarum
Quartz | Level 8

Worked fine.. thanks.. now I have to get a flag for the missing

Astounding
PROC Star

podarum,

So far so good, but you'll have to do more of the work than that.  How many flags would you like for each variable?  What values would you like them to take on?  Please illustrate.

podarum
Quartz | Level 8

For example have a field called MISSING, and if the value is missing at the start then have an 'S' in that field, and if the missing is at the end then have an 'E" in that cell, and nothing if the value is not missing or if it's missing in between.

Astounding
PROC Star

OK, here's a variation based on my original program, and switching over to borrow a little bit of logic from the PG Stats solution.  As before, it addresses var1 only:

data want;

   var1_beginning=0;

   var1_ending=0;

   n_store=0;

   do until (last.store);

       set have;

       by store;

       if var1=. then do;

          var1_ending + 1;

          if var1_beginning = n_store then var1_beginning + 1;

       end;

       else var1_ending = 0;

       n_store + 1;

   end;

   n_store_so_far=0;

   do until (last.store);

       set have;

       by store;

       n_store_so_far + 1;

       if n_store_so_far <= var1_beginning then var1_flag='S';

       else if (n_store - var1_ending )  < n_store_so_var then var1_flag='E';

       else var1_flag=' ';

       output;

   end;

   * As you decide:  drop var1_beginning var1_ending n_store n_store_so_far;

run;

It's untested, so I've got fingers crossed here.  Let me know if you encounter any problems with it.

podarum
Quartz | Level 8

What can I say.. Brilliant.. not only did it work perfectly, it was on the first shot.. great stuff, Thanks

podarum
Quartz | Level 8

Hi Astounding,  This information is invaluable, but since I'm terrible with proc sql, I'm hoping you can help me with an add extra step.  Is there a way I can do the same thing we have with var1, but within the same step, add a var2 and var3 ??  So have an 'S2' and and 'E2' for var2 , and 'S3' and 'E3' for var3, etc.   Thanks..

Astounding
PROC Star

podarum,

Though some may disagree, I believe one of the primary purposes of this facility is to teach you to fish, rather than to present you with a basket of fish.  I think this is something you can do yourself, if you would try.  Just in case you need a hint or two, I'll spell out a little more below.

There's no SQL in my solution, so you don't need to know any of that.  The existing variables N_STORE and N_STORE_SO_FAR can remain as they are.  No matter how many variables you want to add, the number of observations per STORE does not change.  For each variable that you add, you need to compute 3 more variables.  For example, to add a flag for VAR2, you would need to add VAR2_BEGINNING, VAR2_ENDING, and VAR2_FLAG.  It should be simple enough to repeat the logic that you see for VAR1.  If you really try and can't do it, post your progress and we can tweak it.

Good luck.

podarum
Quartz | Level 8

Thank you Astounding.  I have tried it and tried, but as you can see I'm not that solid as I'm still learning the more complex procedures and processes.  I will give it a try, and post.

PGStats
Opal | Level 21

This will do it...

data  test;
input Store $   Var1   Var2   Var3;
datalines;
A          .          4       .
A          .          5       .
A          4         .        4
A          6         .        5
A          .          .        6
B          4         .        9
B          5         4       .
B          .          6       8
;

data cnt(keep=store start1-start3 finish1-finish3);
array v(3) var1-var3;
array start(3) start1-start3;
array finish(3) finish1-finish3;
retain n start finish;
set test;
by store;
if first.store then do;
n = 0;
do i = 1 to 3; start(i)=0; finish(i)=0; end;
end;
do i = 1 to 3;
if missing(v(i)) then do;
  finish(i) + 1;
  start(i) + ( start(i) = n );
end;
else finish(i) = 0;
end;
n + 1;
if last.store then output;
run;

proc print; run;

PG

PG
Astounding
PROC Star

PGStats,

I like the way you use N instead of my clumsier VAR1_FLAG.  Note, however, that there is a method to my madness.  If the solution were expanded to address some of the other questions, such as setting flags or deleting records, the subsequent programming would be different.  I'm going to imagine that I created N like you did, instead of creating VAR1_FLAG.  In that case, you would merge your result with the original data, while I would add another DO loop at the bottom of the same DATA step:

do until (last.store);

    set have;

    by store;

    * Use previously calculated variables to assign flags, and/or output proper records;

end;

To each his own!

PGStats
Opal | Level 21

Astounding, I'm flattered that you took the time to figure out my algorithm. I hadn't thought of putting the set inside a do until(last.) loop, it's a good idea which I hope to reuse one day. I agree with you that the best way to solve a problem is not always the most "clever". Readable, elegant and generalisable solutions are what I strive for in my own work. But when I work on others' problems, with minimal understanding of their context, I code as if the exercise was purely academic (and it often is, I'm afraid).

PG

PG
Ksharp
Super User

What output dataset do you need ?

data  test;
input Store $   Var1   Var2   Var3;
datalines;
A          .          4       .
A          .          5       .
A          4         .        4
A          6         .        5
A          .          .        6
B          4         .        9
B          5         4       .
B          .          6       8
;
run;
data test;
 set test;
 by store;
 if first.store then do;flag1=0;flag2=0;flag3=0;end;
 if var1 ne lag(var1) then flag1+1;
 if var2 ne lag(var2) then flag2+1;
 if var3 ne lag(var3) then flag3+1;
run;
proc sql;
 create table want as
  select 'var1' as var,store,flag1 as flag,count(*) as count
   from test
    where var1=.
    group by store,flag1
     having flag1=min(flag1) or flag1=max(flag1) 
  union all
  select 'var2',store,flag2 as flag,count(*) as count
   from test
    where var2=.
    group by store,flag2
     having flag2=min(flag2) or flag2=max(flag2) 
  union all
  select 'var3',store,flag3 as flag,count(*) as count
   from test
    where var3=.
    group by store,flag3
     having flag3=min(flag3) or flag3=max(flag3) 
;

quit;

Ksharp

podarum
Quartz | Level 8

This is great stuff .. thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1415 views
  • 0 likes
  • 4 in conversation