BookmarkSubscribeRSS Feed
GuiVtzl
Fluorite | Level 6

Hello everyone,

I have a little problem. I have this kind of data set with say, 1 000 000 observations :

VAR_1 VAR_2 VAR_3 [...] VAR_122  SPAN

The SPAN variable gives me a number between 1 and 122 and what I'd like to do is count for each observation the number of missing values between VAR_1 and VAR_&SPAN.

I don't really know how to perform that calculation since it's a "row calculation". Writing it wrong, it would look like

N_MISS = count missing(VAR_1-VAR_&SPAN)

But I have no idea about how to perform that calculation for each row,    I only know how to do that in column (more simple ahah)

Could you help me please ?

Thanks a lot

13 REPLIES 13
GuiVtzl
Fluorite | Level 6

Thank you Jaap for your answer.

I am not that familiar with arrays, but I tried looking at that.

So I guess I should start doing something like :

array var

  • var_1-var_122;
  • array span

  • SPAN;
  • array n_miss

  • n_miss;
  • Is that correct ?

    But then i don't see how i should do my loop.

    I thought it would be :

    do  i=1 to 122

    n_miss = cmiss (of var[1]-var[SPAN];

    leave;

    end;

    run;

    But again, I am not at ease yet with arrays.

    Could you correct me please ?

    Thanks a lot

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    I think you would need something more like:

    n_miss=0;

    do I=1 to 122;

         if var=. then n_miss=n_miss+1;

    end;

    GuiVtzl
    Fluorite | Level 6

    Thank you RW9,

    the problem is that your code seems to count the missing values for the 122 variables "VAR" for each observation.

    In my case I need for each obs to count the missing values from VAR_1 to VAR_j where j is the value of variable SPAN.

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Maybe I misunderstand, but for the code above you would get:

    datastep start.

    arrive at observation1.

    set n_miss to 0

    start do loop with 1

    if var[1] is missing the n_miss+1;

    do loop with 2

    if var[2] is missing then n_miss+1;

    ...

    end loop, n_miss contains, for the first observation, a count of all columns var1-var122 where the value is missing.

    now move onto observation 2.

    This seems to be what you are asking?

    jakarman
    Barite | Level 11

    My fault Id did not include a good link for array-s, this one is better. SAS(R) 9.4 Language Reference: Concepts, Second Edition fo concepts and as reference SAS(R) 9.4 Statements: Reference, Second Edition  your array declaration should be something like:

    array aspan{*} var_1 - var_122 ;  /* the * is counting the 122 automatically */

    After that you can refer to the element of an array using loop-processing.

    This is offering full freedom like matrix processing.

          do i=1 to ..;

             span= <some processing  aspan{i} > ;

          end;

    Reviewing your question, There are a lot of functions that are working on a list of variables. My first link was one in that direction.
    you are just wanting to count the number of missings? SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition (NMISS)
    span=nmiss(of var_1 - var_122) ;

    That is using the list processing of variables SAS(R) 9.4 Language Reference: Concepts, Second Edition.
    That a list of variables can be the argument is not a very common concept in programming languages.

    ---->-- ja karman --<-----
    GuiVtzl
    Fluorite | Level 6

    Thank you both !

    I am going to put a little example to show what I want to count (in bold the variables I already have, I want to create the variable N_MISS):

    VAR_1     VAR_2      VAR_3      VAR_4      VAR_5     SPAN     N_MISS

         5               10           .               .                    10          5               2

         2               3             .               .                    .               3               1

         .               .               1               5                    .               4               2

    ------------------------------------------------

    I am sorry if I get it wrong but I have the impression that when writing things like nmiss(of var_1 - var_122) ; or                  

    n_miss=0;

    do I=1 to 122;

         if var=. then n_miss=n_miss+1;

    end;

    I will get N_MISS values 2 3 3 instead of the 2 1 2 above

    My problem seems to me (again, I am probably wrong !) that my loop isn't fixed since for each observation (row) I don't use the same span to count. For my example above, 1st line I count on var1-var5, 2nd line I count on var1-var3 and 3rd line I count on var1-var4.

    Thank you very much for your help, those arrays are totally new to me (discovered them this morning ahah)

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Ok, so you want to count the missing based on another variable span, that's fine just change to:

    n_miss=0;

    do I=1 to 122;

         if var=. and I <= span then n_miss=n_miss+1;

    end;

    GuiVtzl
    Fluorite | Level 6

    I mixed all of your suggestions and it did exactly what I wanted !

    Thanks a lot guys for your help ! Plus, it made me discover the arrays ! Great !

    Have a nice day !

    andreas_lds
    Jade | Level 19

    Is it really necessary to do the iteration?

    data work.want;

       set work.have;

       length n_miss 8;

       array vars

  • var_:;
  •  

       n_miss = nmiss(of var_:) - (dim(vars) - span);

    run;

    The result of nmiss(of var_:) has to be reduced by the number of variables which are expected to missing.

    And
    Calcite | Level 5 And
    Calcite | Level 5

    Hello,

    Here is a macro which also provides the desired result.

    %macro tocalc (nbvar);

    data want;

    set have;

    mc = span-(%eval(&nbvar) - N(of var1-var&nbvar));

    %put var&nbvar;

    run;

    %mend tocalc;

    %tocalc(5);

    SAS Innovate 2025: Save the Date

     SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

    Save the date!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 13 replies
    • 2320 views
    • 1 like
    • 6 in conversation