BookmarkSubscribeRSS Feed
jing2000yr
Calcite | Level 5

Hello,

I am cleaning our database and trying to re-contact the participants a second time to acquire information in order to reduce missing values. For example, I have 1000 participants and 200 variables. I want to output participants whose data are missing for 100 key variables among the 200 variables. Participant A may only have 2 variables with missing values and participant B may have 30 variables with missing values. I don't know how to output participants with missing values.

The hypothetical dataset as shown below, I want to output participant1 showing var2, var4 with missing values; and participant2 showing var1, var3, var4 and var5 with missing values;

Thanks a lot!

Jing Yuan

   

novar1var2var3var4var5var6
1128.45.4467
2.34..66.
3235..56.78
4321.56..66
5214....68
16 REPLIES 16
Reeza
Super User

What does your input look like?

If you just want anyone with missing values and all values are numeric

if nmiss(of var1-var100)>0 then output;

jing2000yr
Calcite | Level 5

Thanks. I have both numeric and character variables.

jing2000yr
Calcite | Level 5

And most are character variables. The code above seems only work for numeric variables.  Sorry my hypothetical dataset showing as numberics.

Many thanks.

Reeza
Super User

I don't know of a way to check all at once Smiley Sad

You can create two arrays one for character one for numeric, do you have a naming convention for your variables?

jing2000yr
Calcite | Level 5

No. I don't have naming convention. Would you please tell me the sample for creat array for character variables? I tried but failed.

Thanks.

jing2000yr
Calcite | Level 5

Thanks, Ballardw. It looks very complicated to me, a newer to SASSmiley Happy. I will try it. Will that also work for character variables?

ballardw
Super User

If you note the variables involved are both character and numeric. The key is providing the list(s) of variables.

ballardw
Super User

The below program will create a list in the results window with the NO and the variables that are missing.

The 6th row of data is to show that it doesn't produce output when all are valid. The data step is to have some

minimal working data.

options missing=' '; /* to make the dummy text variables below actually blank*/

data work.test;
input no var1 var2 var3 var4 var5 var6;
   tvar1 = put(var2,best4.);
   tvar3 = put(var3,best4.);
datalines;
1 128 . 45 . 44 67
2 . 34 . . 66 .
3 235 . . 56 . 78
4 321 . 56 . . 66
5 214 . . . . 68
6 1 2 3 4 5 6
run;

%let varlist= var1 var2 var3 var4 var5 var6 tvar1 tvar3;/* key, YOU provide your list of variables in the order you want them*/
data _null_;
   file print;
   set work.test;
   if nmiss(of var1-var6,tvar1,tvar3)>0 then do;    /* to use VARLIST you would need to have a comma separated version*/
      do i = 1 to (countw("&varlist"));
      if i=1 then put no= +1 @;
      v = scan("&varlist",i);
      if missing(vvaluex(scan("&varlist",i))) then put v +5 @;
      if i = (countw("&varlist")) then put;
      end;
   end;
run;

morgalr
Obsidian | Level 7

Here is how I would go about solving the problem:

make an input array for your input dataset,

make 2 variables for your output dataset--ID and Missing,

use VTYPE to check the data type of each variable in your input array.

use appropriate number or character functions to do your checking

concatinate the character representation for your index for your input array onto an output string each time you find a missing or null value.

set a flag saying you have missing values

output your string if you have any missing.

keep client number and string showing missing

loop until done.

do what you'll get is:

an ID field and a string filed, the string field will contain all missing indexes.

[1] [2 4]

[2] [1 3 4 6]

[3] [2 3 5]

[4] [2 4 5]

[5] [2 3 4 5]

KachiM
Rhodochrosite | Level 12

Let us take a small example based on SASHELP.CLASS.

Create a data set, CLASS from SASHELP.CLASS and at the same time, get the number of numeric and character variables. Also make some the variables to be missing.

data class;

   set sashelp.class end = eof;

   array n

  • _numeric_;
  •    array c

  • _character_;
  •    do _n_ = 1 to dim(n);

          if ranuni(123) < 0.7 then n[_n_] = .;

       end;

       do _n_ = 1 to dim(c);

          if ranuni(123) < 0.4 then c[_n_] = ' ';

       end;

       if eof then do;

          call symputx('num', dim(n));

          call symputx('chr', dim(c));

       end;

    run;

    %put &num  ;

    %put &chr= ;

    Now the data set, CLASS will resemble what you start with. Now we will check each variable for MISSING, and if so, make it to take a label of 1(one) and if not missing then the label will be 0(zero).

    data want;

       set class end = eof;

       array n

  • _numeric_;
  •    array c

  • _character_;
  •    array miss_num

  • _numeric_;
  •    array miss_chr

  • _character_;
  •    do i = 1 to dim(n);

          miss_num = ifN(missing(n), 1, 0);

       end;

       do i = 1 to dim(c);

          miss_chr = ifN(missing(c), 1, 0);

       end;

       output;

    drop i;    

    run;

    The output data set, WANT, is what you are looking for ( as I understand your question).

    I realized that two arrays will do and not 4 arrays in the last program. The revised program follows:

    data want;

       set class end = eof;

       array n

  • _numeric_;
  •    array c

  • _character_;
  •    do i = 1 to dim(n);

          n = ifN(missing(n), 1, 0);

       end;

       do i = 1 to dim(c);

          c = ifC(missing(c), '1', '0');

       end;

       output;

    drop i;     

    run;

    Message was edited by: MUTHIA KACHIRAYAN

    Reeza
    Super User

    A modified solution based on solution. It assumes that all other variables are not missing, i.e. ID and any other variables in the dataset.

    *Create a sample dataset;

    data class;

       set sashelp.class;

       array n

  • _numeric_;
  •    array c

  • _character_;
  •    do _n_ = 1 to dim(n);

          if ranuni(123) < 0.7 then n[_n_] = .;

       end;

       do _n_ = 1 to dim(c);

          if ranuni(123) < 0.4 then c[_n_] = ' ';

       end;

    run;

    *Create output dataset with only missing variables included;

    data want;

       set class ;

       array n

  • _numeric_;
  •    array c

  • _character_;
  • if nmiss(of n(*))>0 or miss(of c(*))>0 then output;

    run;

    morgalr
    Obsidian | Level 7

    proc sql;
      create table test (myID num,
                         myOne varchar(5),
                         myTwo num,
          myThree varchar(5),
          myFour num,
          myFive num,
          mySix varchar(5));

      insert into test values(1, '', null, '3', 4, 5, '6');
      insert into test values(2, '1', null, '3', 4, 5, '');
      insert into test values(3, '', 2, '3', null, 5, '');
      insert into test values(4, '1', 2, '3', null, 5, '6');
      insert into test values(5, '1', 2, '3', 4, 5, '');
    quit;
    /*
    This will give you the user id along with a string containing the name of each colum that
    was missing or null. colum list is in a funky order because of using two seperate arrays.
    */
    data myOut (replace=yes compress=no);
      set test;
      array myChar

  • _CHARACTER_;
      array myNum
  • _NUMERIC_;
      myFlag = 0;
      length myMissing $ 300;
      length myMissingOne $ 300;
  •   do _i=1 to dim(myChar);
        if(myChar[_i] = '') then do;
            myMissingOne = VNAME(myChar[_i]);
      myMissing = catx(' ', myMissing, myMissingOne);
      myFlag=1;
      end;
      end;
      do _i=1 to dim(myNum);
        if(myNum[_i] = .) then do;
       myMissingOne = VNAME(myNum[_i]);
       myMissing = catx(' ', myMissing, myMissingOne);
       myFlag = 1;
    end;
      end;
      if(myFlag = 1) then output;
      keep myID myMissing;
      run;

    /*
    This will give you a dataset listing all of the columns that contain either null or missing.
    The columns are tied to the client ID through a one to many relationship in this table. One
    client ID and many columns. The client ID/Column relation is unique.
    */
    data myOutTwo (replace=yes compress=no);
      set test;
      array myChar

  • _CHARACTER_;
      array myNum
  • _NUMERIC_;
      myFlag = 0;
      length myMissing $ 300;
  •   do _i=1 to dim(myChar);
        if(myChar[_i] = '') then do;
            myMissing = VNAME(myChar[_i]);
      output;
      end;
      end;
      do _i=1 to dim(myNum);
        if(myNum[_i] = .) then do;
       myMissing = VNAME(myNum[_i]);
       output;
    end;
      end;
      keep myID myMissing;
      run;

    Haikuo
    Onyx | Level 15

    Am I missing something here? What's wrong if using CMISS and _ALL_?

    data have;

         set sashelp.class;

         if mod(_n_,2)=1 then

               call missing(age, sex);

    run;

    data want;

         set have;

         if cmiss(of _all_)>0;

    run;

    Haikuo

    Reeza
    Super User

    CMISS takes both numeric and character, cause that's what you'd expect :smileygrin:

    Nice one!

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 16 replies
    • 2661 views
    • 2 likes
    • 6 in conversation