BookmarkSubscribeRSS Feed
GuiVtzl
Fluorite | Level 6

Hello everyone,

I have a little problem. I have this kind of data set with say, 1 000 000 observations :

VAR_1 VAR_2 VAR_3 [...] VAR_122  SPAN

The SPAN variable gives me a number between 1 and 122 and what I'd like to do is count for each observation the number of missing values between VAR_1 and VAR_&SPAN.

I don't really know how to perform that calculation since it's a "row calculation". Writing it wrong, it would look like

N_MISS = count missing(VAR_1-VAR_&SPAN)

But I have no idea about how to perform that calculation for each row,    I only know how to do that in column (more simple ahah)

Could you help me please ?

Thanks a lot

13 REPLIES 13
GuiVtzl
Fluorite | Level 6

Thank you Jaap for your answer.

I am not that familiar with arrays, but I tried looking at that.

So I guess I should start doing something like :

array var

  • var_1-var_122;
  • array span

  • SPAN;
  • array n_miss

  • n_miss;
  • Is that correct ?

    But then i don't see how i should do my loop.

    I thought it would be :

    do  i=1 to 122

    n_miss = cmiss (of var[1]-var[SPAN];

    leave;

    end;

    run;

    But again, I am not at ease yet with arrays.

    Could you correct me please ?

    Thanks a lot

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    I think you would need something more like:

    n_miss=0;

    do I=1 to 122;

         if var=. then n_miss=n_miss+1;

    end;

    GuiVtzl
    Fluorite | Level 6

    Thank you RW9,

    the problem is that your code seems to count the missing values for the 122 variables "VAR" for each observation.

    In my case I need for each obs to count the missing values from VAR_1 to VAR_j where j is the value of variable SPAN.

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Maybe I misunderstand, but for the code above you would get:

    datastep start.

    arrive at observation1.

    set n_miss to 0

    start do loop with 1

    if var[1] is missing the n_miss+1;

    do loop with 2

    if var[2] is missing then n_miss+1;

    ...

    end loop, n_miss contains, for the first observation, a count of all columns var1-var122 where the value is missing.

    now move onto observation 2.

    This seems to be what you are asking?

    jakarman
    Barite | Level 11

    My fault Id did not include a good link for array-s, this one is better. SAS(R) 9.4 Language Reference: Concepts, Second Edition fo concepts and as reference SAS(R) 9.4 Statements: Reference, Second Edition  your array declaration should be something like:

    array aspan{*} var_1 - var_122 ;  /* the * is counting the 122 automatically */

    After that you can refer to the element of an array using loop-processing.

    This is offering full freedom like matrix processing.

          do i=1 to ..;

             span= <some processing  aspan{i} > ;

          end;

    Reviewing your question, There are a lot of functions that are working on a list of variables. My first link was one in that direction.
    you are just wanting to count the number of missings? SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition (NMISS)
    span=nmiss(of var_1 - var_122) ;

    That is using the list processing of variables SAS(R) 9.4 Language Reference: Concepts, Second Edition.
    That a list of variables can be the argument is not a very common concept in programming languages.

    ---->-- ja karman --<-----
    GuiVtzl
    Fluorite | Level 6

    Thank you both !

    I am going to put a little example to show what I want to count (in bold the variables I already have, I want to create the variable N_MISS):

    VAR_1     VAR_2      VAR_3      VAR_4      VAR_5     SPAN     N_MISS

         5               10           .               .                    10          5               2

         2               3             .               .                    .               3               1

         .               .               1               5                    .               4               2

    ------------------------------------------------

    I am sorry if I get it wrong but I have the impression that when writing things like nmiss(of var_1 - var_122) ; or                  

    n_miss=0;

    do I=1 to 122;

         if var=. then n_miss=n_miss+1;

    end;

    I will get N_MISS values 2 3 3 instead of the 2 1 2 above

    My problem seems to me (again, I am probably wrong !) that my loop isn't fixed since for each observation (row) I don't use the same span to count. For my example above, 1st line I count on var1-var5, 2nd line I count on var1-var3 and 3rd line I count on var1-var4.

    Thank you very much for your help, those arrays are totally new to me (discovered them this morning ahah)

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Ok, so you want to count the missing based on another variable span, that's fine just change to:

    n_miss=0;

    do I=1 to 122;

         if var=. and I <= span then n_miss=n_miss+1;

    end;

    GuiVtzl
    Fluorite | Level 6

    I mixed all of your suggestions and it did exactly what I wanted !

    Thanks a lot guys for your help ! Plus, it made me discover the arrays ! Great !

    Have a nice day !

    andreas_lds
    Jade | Level 19

    Is it really necessary to do the iteration?

    data work.want;

       set work.have;

       length n_miss 8;

       array vars

  • var_:;
  •  

       n_miss = nmiss(of var_:) - (dim(vars) - span);

    run;

    The result of nmiss(of var_:) has to be reduced by the number of variables which are expected to missing.

    And
    Calcite | Level 5 And
    Calcite | Level 5

    Hello,

    Here is a macro which also provides the desired result.

    %macro tocalc (nbvar);

    data want;

    set have;

    mc = span-(%eval(&nbvar) - N(of var1-var&nbvar));

    %put var&nbvar;

    run;

    %mend tocalc;

    %tocalc(5);

    sas-innovate-wordmark-2025-midnight.png

    Register Today!

    Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 13 replies
    • 2596 views
    • 1 like
    • 6 in conversation