How could I find how many observations are missing at the front and end, By group.. So for example:
HAVE
Store Var1 Var2 Var3
A . 4 .
A . 5 .
A 4 . 4
A 6 . 5
A . . 6
B 4 . 9
B 5 4 .
B . 6 8
It would tell me that
Store A - Var1 has 2 obs missing at the front and 1 at the end
Store B - Var1 has 1 at end
Store A - Var2 has 3 at the end
Store B - Var2 has 1 at the front
Store A - Var3 has 2 at the front
Store B - Var3 has NONE
Even better if there is a way I can identify them through an indicator or something, that way I can delete thenm if I chose.
thank you..
OK, here's a variation based on my original program, and switching over to borrow a little bit of logic from the PG Stats solution. As before, it addresses var1 only:
data want;
var1_beginning=0;
var1_ending=0;
n_store=0;
do until (last.store);
set have;
by store;
if var1=. then do;
var1_ending + 1;
if var1_beginning = n_store then var1_beginning + 1;
end;
else var1_ending = 0;
n_store + 1;
end;
n_store_so_far=0;
do until (last.store);
set have;
by store;
n_store_so_far + 1;
if n_store_so_far <= var1_beginning then var1_flag='S';
else if (n_store - var1_ending ) < n_store_so_var then var1_flag='E';
else var1_flag=' ';
output;
end;
* As you decide: drop var1_beginning var1_ending n_store n_store_so_far;
run;
It's untested, so I've got fingers crossed here. Let me know if you encounter any problems with it.
podarum,
Counting is mildly complex, assuming your data are in order by STORE. Here's code for VAR1 only (untested code):
data want (keep=store var1_beginning var1_ending);
var1_beginning=0;
var1_ending=0;
var1_flag='Y';
do until (last.store);
set have;
by store;
if var1=. then do;
var1_ending + 1;
if var1_flag='Y' then var1_beginning + 1;
end;
else do;
var1_ending=0;
var1_flag='N';
end;
end;
run;
Setting up an indicator? You would certainly need 3 indicators, one for each variable, and many more observations in the output data set. Let's see if this part makes sense first.
Good luck.
Worked fine.. thanks.. now I have to get a flag for the missing
podarum,
So far so good, but you'll have to do more of the work than that. How many flags would you like for each variable? What values would you like them to take on? Please illustrate.
For example have a field called MISSING, and if the value is missing at the start then have an 'S' in that field, and if the missing is at the end then have an 'E" in that cell, and nothing if the value is not missing or if it's missing in between.
OK, here's a variation based on my original program, and switching over to borrow a little bit of logic from the PG Stats solution. As before, it addresses var1 only:
data want;
var1_beginning=0;
var1_ending=0;
n_store=0;
do until (last.store);
set have;
by store;
if var1=. then do;
var1_ending + 1;
if var1_beginning = n_store then var1_beginning + 1;
end;
else var1_ending = 0;
n_store + 1;
end;
n_store_so_far=0;
do until (last.store);
set have;
by store;
n_store_so_far + 1;
if n_store_so_far <= var1_beginning then var1_flag='S';
else if (n_store - var1_ending ) < n_store_so_var then var1_flag='E';
else var1_flag=' ';
output;
end;
* As you decide: drop var1_beginning var1_ending n_store n_store_so_far;
run;
It's untested, so I've got fingers crossed here. Let me know if you encounter any problems with it.
What can I say.. Brilliant.. not only did it work perfectly, it was on the first shot.. great stuff, Thanks
Hi Astounding, This information is invaluable, but since I'm terrible with proc sql, I'm hoping you can help me with an add extra step. Is there a way I can do the same thing we have with var1, but within the same step, add a var2 and var3 ?? So have an 'S2' and and 'E2' for var2 , and 'S3' and 'E3' for var3, etc. Thanks..
podarum,
Though some may disagree, I believe one of the primary purposes of this facility is to teach you to fish, rather than to present you with a basket of fish. I think this is something you can do yourself, if you would try. Just in case you need a hint or two, I'll spell out a little more below.
There's no SQL in my solution, so you don't need to know any of that. The existing variables N_STORE and N_STORE_SO_FAR can remain as they are. No matter how many variables you want to add, the number of observations per STORE does not change. For each variable that you add, you need to compute 3 more variables. For example, to add a flag for VAR2, you would need to add VAR2_BEGINNING, VAR2_ENDING, and VAR2_FLAG. It should be simple enough to repeat the logic that you see for VAR1. If you really try and can't do it, post your progress and we can tweak it.
Good luck.
Thank you Astounding. I have tried it and tried, but as you can see I'm not that solid as I'm still learning the more complex procedures and processes. I will give it a try, and post.
This will do it...
data test;
input Store $ Var1 Var2 Var3;
datalines;
A . 4 .
A . 5 .
A 4 . 4
A 6 . 5
A . . 6
B 4 . 9
B 5 4 .
B . 6 8
;
data cnt(keep=store start1-start3 finish1-finish3);
array v(3) var1-var3;
array start(3) start1-start3;
array finish(3) finish1-finish3;
retain n start finish;
set test;
by store;
if first.store then do;
n = 0;
do i = 1 to 3; start(i)=0; finish(i)=0; end;
end;
do i = 1 to 3;
if missing(v(i)) then do;
finish(i) + 1;
start(i) + ( start(i) = n );
end;
else finish(i) = 0;
end;
n + 1;
if last.store then output;
run;
proc print; run;
PG
PGStats,
I like the way you use N instead of my clumsier VAR1_FLAG. Note, however, that there is a method to my madness. If the solution were expanded to address some of the other questions, such as setting flags or deleting records, the subsequent programming would be different. I'm going to imagine that I created N like you did, instead of creating VAR1_FLAG. In that case, you would merge your result with the original data, while I would add another DO loop at the bottom of the same DATA step:
do until (last.store);
set have;
by store;
* Use previously calculated variables to assign flags, and/or output proper records;
end;
To each his own!
Astounding, I'm flattered that you took the time to figure out my algorithm. I hadn't thought of putting the set inside a do until(last.) loop, it's a good idea which I hope to reuse one day. I agree with you that the best way to solve a problem is not always the most "clever". Readable, elegant and generalisable solutions are what I strive for in my own work. But when I work on others' problems, with minimal understanding of their context, I code as if the exercise was purely academic (and it often is, I'm afraid).
PG
What output dataset do you need ?
data test; input Store $ Var1 Var2 Var3; datalines; A . 4 . A . 5 . A 4 . 4 A 6 . 5 A . . 6 B 4 . 9 B 5 4 . B . 6 8 ; run; data test; set test; by store; if first.store then do;flag1=0;flag2=0;flag3=0;end; if var1 ne lag(var1) then flag1+1; if var2 ne lag(var2) then flag2+1; if var3 ne lag(var3) then flag3+1; run; proc sql; create table want as select 'var1' as var,store,flag1 as flag,count(*) as count from test where var1=. group by store,flag1 having flag1=min(flag1) or flag1=max(flag1) union all select 'var2',store,flag2 as flag,count(*) as count from test where var2=. group by store,flag2 having flag2=min(flag2) or flag2=max(flag2) union all select 'var3',store,flag3 as flag,count(*) as count from test where var3=. group by store,flag3 having flag3=min(flag3) or flag3=max(flag3) ; quit;
Ksharp
This is great stuff .. thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.