Hello everyone,
I have a little problem. I have this kind of data set with say, 1 000 000 observations :
VAR_1 VAR_2 VAR_3 [...] VAR_122 SPAN
The SPAN variable gives me a number between 1 and 122 and what I'd like to do is count for each observation the number of missing values between VAR_1 and VAR_&SPAN.
I don't really know how to perform that calculation since it's a "row calculation". Writing it wrong, it would look like
N_MISS = count missing(VAR_1-VAR_&SPAN)
But I have no idea about how to perform that calculation for each row, I only know how to do that in column (more simple ahah)
Could you help me please ?
Thanks a lot
use array-s Base SAS(R) 9.4 Procedures Guide, Second Edition.
Thank you Jaap for your answer.
I am not that familiar with arrays, but I tried looking at that.
So I guess I should start doing something like :
array var
array span
array n_miss
Is that correct ?
But then i don't see how i should do my loop.
I thought it would be :
do i=1 to 122
n_miss = cmiss (of var[1]-var[SPAN];
leave;
end;
run;
But again, I am not at ease yet with arrays.
Could you correct me please ?
Thanks a lot
I think you would need something more like:
n_miss=0;
do I=1 to 122;
if var=. then n_miss=n_miss+1;
end;
Thank you RW9,
the problem is that your code seems to count the missing values for the 122 variables "VAR" for each observation.
In my case I need for each obs to count the missing values from VAR_1 to VAR_j where j is the value of variable SPAN.
Maybe I misunderstand, but for the code above you would get:
datastep start.
arrive at observation1.
set n_miss to 0
start do loop with 1
if var[1] is missing the n_miss+1;
do loop with 2
if var[2] is missing then n_miss+1;
...
end loop, n_miss contains, for the first observation, a count of all columns var1-var122 where the value is missing.
now move onto observation 2.
This seems to be what you are asking?
My fault Id did not include a good link for array-s, this one is better. SAS(R) 9.4 Language Reference: Concepts, Second Edition fo concepts and as reference SAS(R) 9.4 Statements: Reference, Second Edition your array declaration should be something like:
array aspan{*} var_1 - var_122 ; /* the * is counting the 122 automatically */
After that you can refer to the element of an array using loop-processing.
This is offering full freedom like matrix processing.
do i=1 to ..;
span= <some processing aspan{i} > ;
end;
Reviewing your question, There are a lot of functions that are working on a list of variables. My first link was one in that direction.
you are just wanting to count the number of missings? SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition (NMISS)
span=nmiss(of var_1 - var_122) ;
That is using the list processing of variables SAS(R) 9.4 Language Reference: Concepts, Second Edition.
That a list of variables can be the argument is not a very common concept in programming languages.
Thank you both !
I am going to put a little example to show what I want to count (in bold the variables I already have, I want to create the variable N_MISS):
VAR_1 VAR_2 VAR_3 VAR_4 VAR_5 SPAN N_MISS
5 10 . . 10 5 2
2 3 . . . 3 1
. . 1 5 . 4 2
------------------------------------------------
I am sorry if I get it wrong but I have the impression that when writing things like nmiss(of var_1 - var_122) ; or
n_miss=0;
do I=1 to 122;
if var=. then n_miss=n_miss+1;
end;
I will get N_MISS values 2 3 3 instead of the 2 1 2 above
My problem seems to me (again, I am probably wrong !) that my loop isn't fixed since for each observation (row) I don't use the same span to count. For my example above, 1st line I count on var1-var5, 2nd line I count on var1-var3 and 3rd line I count on var1-var4.
Thank you very much for your help, those arrays are totally new to me (discovered them this morning ahah)
Ok, so you want to count the missing based on another variable span, that's fine just change to:
n_miss=0;
do I=1 to 122;
if var=. and I <= span then n_miss=n_miss+1;
end;
I think this would be easier and more intuitive:
do I=1 to span;
if var=. then n_miss=n_miss+1;
end;
I mixed all of your suggestions and it did exactly what I wanted !
Thanks a lot guys for your help ! Plus, it made me discover the arrays ! Great !
Have a nice day !
Is it really necessary to do the iteration?
data work.want;
set work.have;
length n_miss 8;
array vars
n_miss = nmiss(of var_:) - (dim(vars) - span);
run;
The result of nmiss(of var_:) has to be reduced by the number of variables which are expected to missing.
If it is not guaranteed that variables beyond SPAN are always missing, this approach will give wrong results.
Hello,
Here is a macro which also provides the desired result.
%macro tocalc (nbvar);
data want;
set have;
mc = span-(%eval(&nbvar) - N(of var1-var&nbvar));
%put var&nbvar;
run;
%mend tocalc;
%tocalc(5);
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.