Converting Missing Values with a macro

Reply
Regular Contributor
Posts: 180

Converting Missing Values with a macro

Hello all,

 

I have messy data. In a specific folder ( my libname is d in SAS) I have several files. I want a macro that will run on all files and all variables, and every time it sees any of the following: NA, na, ND, N/A, 999, it will replace it with a proper missing value. Note that these values could appear in either numeric or characteristic variables. In addition, this task should skip variables I will specify such as subject ID, as a value of 999 could be valid there (I have over 1000 subjects).

 

I don't know how to perform this, can it be done easily using the macro facility ? Maybe using arrays ? I could do with your help on this one.

 

Thank you in advance.

Regular Contributor
Posts: 180

Re: Converting Missing Values with a macro

One idea I have (assuming I'll have a macro running on all datasets) is to put all variables in an array, and then converting, but how do I put all variables except ID in an array ? To put all, I can use *, but how do I exclude the first variable ? or maybe another one is needed ?
Respected Advisor
Posts: 3,799

Re: Converting Missing Values with a macro

data abc;
   if 0 then set abc(drop=id);
   array _n[*] _numeric_;
   array _c[*] _character_;
   set abc;
   do _n_ = 1 to dim(_n);
      if something...
      end;
   do _n_ = 1 to dim(_c);
      if something...
      end;
Regular Contributor
Posts: 180

Re: Converting Missing Values with a macro

Posted in reply to data_null__
Thank you. Can you kindly explain the "if 0 then..." part ? if I drop ID, don't I lose it ?
Respected Advisor
Posts: 3,799

Re: Converting Missing Values with a macro

[ Edited ]

if 0 is never executed it allows the program to use a SET statement(s) without having them executed.  It only defines variables sans ID or any others that are droped.  Then it defines an array of all charcter and another of all numeric variables sans ID.  Notice also there is a "regular set" statement that is executed and it includes ID.  The drawback to this method is that ID's VARNUM is moved to be the last varnum.  You could fix that by making this step a view to a step that moves it back.

 

data abcV / view=abcV;
   the code from above.
   run;

data abcnew;
   retain id;
   set abcV;
   run;
Super User
Super User
Posts: 7,942

Re: Converting Missing Values with a macro

Personally I don't think that is a good approach.  You have very little way of knowing up front each and every conotation of what could be replaced.  Generally speaking missing values in SAS datasets for numerics are set by the:

options missing=.;

System option.  Then all missing numerics would be present as .  However if you have formats applied to those numerics then this action may differ, I mean if you apply a format to which 9 becomes NA when displayed, how would you handle that, or do you replace all values 9 with missing as well.  As I say, I don't believe this is a good path to proceed on data cleaning.  You should first start by learning your data, what does the specification say - you do have specs yes?  From that isolate what columns need processing, what unique lists of values need to be addressed - maybe not all missings should be imputed to something else - NA in race may indicate that the available optiosn are not sufficient to describe it, and another category is used, removing that missing could impact analysis etc.

So overall, you can theoretically program such a process, I don't think it would be in your best interest however if you do go ahead with it then something like:

data _null_;
  set sashelp.vcolumns (where=(libname="<your_libname>"));
  by memname;
  if first.memname then call execute(cat('data ',libname,'.',memname; set ',libname,'.',memname,';'));
  if type="char" then call execute(cats('if ',name,' in ("NA","N","999") then ',name,'="";'));
  else call execute(cats('if ',name,' in (999) then ',name,'=.;'));
  if last.memname then call execute('run;');
run;

This will create a datastep per dataset in <your libname>, with an if statement per variable name.  However I still advise to follow good process:

1)  Identify your data

2)  Understand your data (I.e. read specifications)

3)  Create data cleaning plan (i.e create specifications)

4)  Program data cleaning checks from specs

Remember documentation and process is more important than how to code something.

Regular Contributor
Posts: 180

Re: Converting Missing Values with a macro

I agree with your comments, and I will check the data carefully. However I already know that most instances are indeed missing values. Your code is beautiful, so short and simple. Saves the use of a macro. However one question. Where in your code you ignore the subject ID column, as this one is numeric and CAN include 999 as a valid number ?
Super User
Super User
Posts: 7,942

Re: Converting Missing Values with a macro

You can control what libraries and datasets you choose by altering the set statement.  So for instance if you want to ignore all variables with the name ID:

data _null_;
  set sashelp.vcolumns (where=(libname="<your_libname>" and name ne "ID"));  /*Here */
  by memname;
  if first.memname then call execute(cat('data ',libname,'.',memname; set ',libname,'.',memname,';'));
  if type="char" then call execute(cats('if ',name,' in ("NA","N","999") then ',name,'="";'));
  else call execute(cats('if ',name,' in (999) then ',name,'=.;'));
  if last.memname then call execute('run;');
run;

It uses the SAS metadata views - VCOLUMNS (and VTABLE).  These know what datasets are in which libraries, what variables there are and what type etc.  From that it can build programs which operate on the data.  Metadata driven programming, powerful, but do make sure you know what your doing - i.e. do a datastep and set and see what data you are getting, you don't want to overwrite data you didn't mean to.  Test it thoroughly as well, metadata changes over time, you don't want damadge because you didn't anticipate something.

Trusted Advisor
Posts: 1,117

Re: Converting Missing Values with a macro

Just a few minor suggestions for @RW9's elegant CALL EXECUTE step:

  • vcolumn (without 's')
  • CAT would be more appropriate than CATS (2x), because CATS('if ',name,' in ...') would result in something like ifmyvarin ...
  • quote before "; set"
  • upcase(name) ne ... just in case the variable name is "id"

The below version worked for me on two test datasets.

data _null_;
  set sashelp.vcolumn (where=(libname="<your libname>" and upcase(name) ne "ID"));  /*Here */
  by memname;
  if first.memname then call execute(cat('data ',libname,'.',memname,'; set ',libname,'.',memname,';'));
  if type="char" then call execute(cat('if ',name,' in ("NA","N","999") then ',name,'="";'));
  else call execute(cat('if ',name,' in (999) then ',name,'=.;'));
  if last.memname then call execute('run;');
run;

 

Respected Advisor
Posts: 3,799

Re: Converting Missing Values with a macro

[ Edited ]

 

That's not correct.  The MISSING option applies to how the missing . is printed and implies how it will appear when used in a CAT-family function or PUT/PUTN function and perhaps other places I can't think of.

 

https://support.sas.com/documentation/cdl/en/lesysoptsref/68023/HTML/default/viewer.htm#n0qamf3yfjtw...

 

RW9 wrote:

Generally speaking missing values in SAS datasets for numerics are set by the:

options missing=.;

System option.  Then all missing numerics would be present as .  However if you have formats applied to those numerics then

 

Ask a Question
Discussion stats
  • 9 replies
  • 548 views
  • 5 likes
  • 4 in conversation