Help using Base SAS procedures

How to output observations whose data are missing for certain variables

Reply
Contributor
Posts: 23

How to output observations whose data are missing for certain variables

Hello,

I am cleaning our database and trying to re-contact the participants a second time to acquire information in order to reduce missing values. For example, I have 1000 participants and 200 variables. I want to output participants whose data are missing for 100 key variables among the 200 variables. Participant A may only have 2 variables with missing values and participant B may have 30 variables with missing values. I don't know how to output participants with missing values.

The hypothetical dataset as shown below, I want to output participant1 showing var2, var4 with missing values; and participant2 showing var1, var3, var4 and var5 with missing values;

Thanks a lot!

Jing Yuan

   

novar1var2var3var4var5var6
1128.45.4467
2.34..66.
3235..56.78
4321.56..66
5214....68
Super User
Posts: 17,813

Re: How to output observations whose data are missing for certain variables

What does your input look like?

If you just want anyone with missing values and all values are numeric

if nmiss(of var1-var100)>0 then output;

Contributor
Posts: 23

Re: How to output observations whose data are missing for certain variables

Thanks. I have both numeric and character variables.

Contributor
Posts: 23

Re: How to output observations whose data are missing for certain variables

And most are character variables. The code above seems only work for numeric variables.  Sorry my hypothetical dataset showing as numberics.

Many thanks.

Super User
Posts: 17,813

Re: How to output observations whose data are missing for certain variables

I don't know of a way to check all at once Smiley Sad

You can create two arrays one for character one for numeric, do you have a naming convention for your variables?

Contributor
Posts: 23

Re: How to output observations whose data are missing for certain variables

No. I don't have naming convention. Would you please tell me the sample for creat array for character variables? I tried but failed.

Thanks.

Contributor
Posts: 23

Re: How to output observations whose data are missing for certain variables

Thanks, Ballardw. It looks very complicated to me, a newer to SASSmiley Happy. I will try it. Will that also work for character variables?

Super User
Posts: 10,497

Re: How to output observations whose data are missing for certain variables

If you note the variables involved are both character and numeric. The key is providing the list(s) of variables.

Super User
Posts: 10,497

Re: How to output observations whose data are missing for certain variables

The below program will create a list in the results window with the NO and the variables that are missing.

The 6th row of data is to show that it doesn't produce output when all are valid. The data step is to have some

minimal working data.

options missing=' '; /* to make the dummy text variables below actually blank*/

data work.test;
input no var1 var2 var3 var4 var5 var6;
   tvar1 = put(var2,best4.);
   tvar3 = put(var3,best4.);
datalines;
1 128 . 45 . 44 67
2 . 34 . . 66 .
3 235 . . 56 . 78
4 321 . 56 . . 66
5 214 . . . . 68
6 1 2 3 4 5 6
run;

%let varlist= var1 var2 var3 var4 var5 var6 tvar1 tvar3;/* key, YOU provide your list of variables in the order you want them*/
data _null_;
   file print;
   set work.test;
   if nmiss(of var1-var6,tvar1,tvar3)>0 then do;    /* to use VARLIST you would need to have a comma separated version*/
      do i = 1 to (countw("&varlist"));
      if i=1 then put no= +1 @;
      v = scan("&varlist",i);
      if missing(vvaluex(scan("&varlist",i))) then put v +5 @;
      if i = (countw("&varlist")) then put;
      end;
   end;
run;

Contributor
Posts: 26

Re: How to output observations whose data are missing for certain variables

Here is how I would go about solving the problem:

make an input array for your input dataset,

make 2 variables for your output dataset--ID and Missing,

use VTYPE to check the data type of each variable in your input array.

use appropriate number or character functions to do your checking

concatinate the character representation for your index for your input array onto an output string each time you find a missing or null value.

set a flag saying you have missing values

output your string if you have any missing.

keep client number and string showing missing

loop until done.

do what you'll get is:

an ID field and a string filed, the string field will contain all missing indexes.

[1] [2 4]

[2] [1 3 4 6]

[3] [2 3 5]

[4] [2 4 5]

[5] [2 3 4 5]

Super Contributor
Posts: 254

Re: How to output observations whose data are missing for certain variables

Let us take a small example based on SASHELP.CLASS.

Create a data set, CLASS from SASHELP.CLASS and at the same time, get the number of numeric and character variables. Also make some the variables to be missing.

data class;

   set sashelp.class end = eof;

   array n

  • _numeric_;
  •    array c

  • _character_;
  •    do _n_ = 1 to dim(n);

          if ranuni(123) < 0.7 then n[_n_] = .;

       end;

       do _n_ = 1 to dim(c);

          if ranuni(123) < 0.4 then c[_n_] = ' ';

       end;

       if eof then do;

          call symputx('num', dim(n));

          call symputx('chr', dim(c));

       end;

    run;

    %put &num  ;

    %put &chr= ;

    Now the data set, CLASS will resemble what you start with. Now we will check each variable for MISSING, and if so, make it to take a label of 1(one) and if not missing then the label will be 0(zero).

    data want;

       set class end = eof;

       array n

  • _numeric_;
  •    array c

  • _character_;
  •    array miss_num

  • _numeric_;
  •    array miss_chr

  • _character_;
  •    do i = 1 to dim(n);

          miss_num = ifN(missing(n), 1, 0);

       end;

       do i = 1 to dim(c);

          miss_chr = ifN(missing(c), 1, 0);

       end;

       output;

    drop i;    

    run;

    The output data set, WANT, is what you are looking for ( as I understand your question).

    I realized that two arrays will do and not 4 arrays in the last program. The revised program follows:

    data want;

       set class end = eof;

       array n

  • _numeric_;
  •    array c

  • _character_;
  •    do i = 1 to dim(n);

          n = ifN(missing(n), 1, 0);

       end;

       do i = 1 to dim(c);

          c = ifC(missing(c), '1', '0');

       end;

       output;

    drop i;     

    run;

    Message was edited by: MUTHIA KACHIRAYAN

    Super User
    Posts: 17,813

    Re: How to output observations whose data are missing for certain variables

    A modified solution based on solution. It assumes that all other variables are not missing, i.e. ID and any other variables in the dataset.

    *Create a sample dataset;

    data class;

       set sashelp.class;

       array n

  • _numeric_;
  •    array c

  • _character_;
  •    do _n_ = 1 to dim(n);

          if ranuni(123) < 0.7 then n[_n_] = .;

       end;

       do _n_ = 1 to dim(c);

          if ranuni(123) < 0.4 then c[_n_] = ' ';

       end;

    run;

    *Create output dataset with only missing variables included;

    data want;

       set class ;

       array n

  • _numeric_;
  •    array c

  • _character_;
  • if nmiss(of n(*))>0 or miss(of c(*))>0 then output;

    run;

    Contributor
    Posts: 26

    Re: How to output observations whose data are missing for certain variables

    proc sql;
      create table test (myID num,
                         myOne varchar(5),
                         myTwo num,
          myThree varchar(5),
          myFour num,
          myFive num,
          mySix varchar(5));

      insert into test values(1, '', null, '3', 4, 5, '6');
      insert into test values(2, '1', null, '3', 4, 5, '');
      insert into test values(3, '', 2, '3', null, 5, '');
      insert into test values(4, '1', 2, '3', null, 5, '6');
      insert into test values(5, '1', 2, '3', 4, 5, '');
    quit;
    /*
    This will give you the user id along with a string containing the name of each colum that
    was missing or null. colum list is in a funky order because of using two seperate arrays.
    */
    data myOut (replace=yes compress=no);
      set test;
      array myChar

  • _CHARACTER_;
      array myNum
  • _NUMERIC_;
      myFlag = 0;
      length myMissing $ 300;
      length myMissingOne $ 300;
  •   do _i=1 to dim(myChar);
        if(myChar[_i] = '') then do;
            myMissingOne = VNAME(myChar[_i]);
      myMissing = catx(' ', myMissing, myMissingOne);
      myFlag=1;
      end;
      end;
      do _i=1 to dim(myNum);
        if(myNum[_i] = .) then do;
       myMissingOne = VNAME(myNum[_i]);
       myMissing = catx(' ', myMissing, myMissingOne);
       myFlag = 1;
    end;
      end;
      if(myFlag = 1) then output;
      keep myID myMissing;
      run;

    /*
    This will give you a dataset listing all of the columns that contain either null or missing.
    The columns are tied to the client ID through a one to many relationship in this table. One
    client ID and many columns. The client ID/Column relation is unique.
    */
    data myOutTwo (replace=yes compress=no);
      set test;
      array myChar

  • _CHARACTER_;
      array myNum
  • _NUMERIC_;
      myFlag = 0;
      length myMissing $ 300;
  •   do _i=1 to dim(myChar);
        if(myChar[_i] = '') then do;
            myMissing = VNAME(myChar[_i]);
      output;
      end;
      end;
      do _i=1 to dim(myNum);
        if(myNum[_i] = .) then do;
       myMissing = VNAME(myNum[_i]);
       output;
    end;
      end;
      keep myID myMissing;
      run;

    Respected Advisor
    Posts: 3,124

    Re: How to output observations whose data are missing for certain variables

    Am I missing something here? What's wrong if using CMISS and _ALL_?

    data have;

         set sashelp.class;

         if mod(_n_,2)=1 then

               call missing(age, sex);

    run;

    data want;

         set have;

         if cmiss(of _all_)>0;

    run;

    Haikuo

    Super User
    Posts: 17,813

    Re: How to output observations whose data are missing for certain variables

    CMISS takes both numeric and character, cause that's what you'd expect :smileygrin:

    Nice one!

    Ask a Question
    Discussion stats
    • 16 replies
    • 440 views
    • 2 likes
    • 6 in conversation