DATA Step, Macro, Functions and more

Replacing a character value "NULL" w/ a numeric value "999" in multiple variables simultaneously?

Reply
Regular Contributor
Posts: 199

Replacing a character value "NULL" w/ a numeric value "999" in multiple variables simultaneously?

[ Edited ]

I have over 100 variables in my dataset, the vast majority of them contain both numeric and character values (i.e. "NULL"). When imported, the variables containing both numeric values and "NULL" are set as character variables. I need them to be numeric so that I can use them in calculations. 

 

What is the best way to accomplish this?

 

I thought I would try to replace all the "NULL" values with "999", but I only know how to do this on a variable by variable basis. I am a relatively notice SAS user.

 

There must be a better way.

 

Thanks!

 

I am using SAS OnDemand for Academics

Super User
Posts: 17,819

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

I would highly recommend against replacing the NULLs with 999. It becomes too likely to accidently use that number in calculations.

 

First, make sure your numeric variables are numeric and character are character. 

Second, SAS handles missing values well, so I wouldn't worry about replacing them with anything. 

 

SAS stores character missing as blank and numeric missing as .

 

 

Regular Contributor
Posts: 199

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

[ Edited ]

<Second, SAS handles missing values well, so I wouldn't worry about replacing them with anything.>

 

The values are not necessarily missing. "NULL" can mean different things in this dataset, depending on the context. 

 

The 999 is going to be replaced, depending on the context. In some situations, it will be replaced with a missing value (i.e. "."). In others, it will be replaced with "0". 

Super User
Posts: 10,500

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

An expansiont on @Reeza and handling missing values. SAS provides way to indicate WHY a variable may be missing, if you know. Using a combination of custom informats and formats you could read the data into new numeric variables using a slight modification of Reeza's code.

 

Here is a brief example of using custom informats and formats

proc format library=work;
invalue ExampleA   (upcase)
"NULL" = .A
;
Value ExampleA
.A = 'Not Entered';
;
invalue ExampleB   (upcase)
"NULL" = .A
999    = .B
;
Value Exampleb
.A = 'Refused to Answer'
.B = 'Invalid response'
;

data example;
   informat Q1 ExampleA.;
   informat Q2 ExampleB.;
   input Q1 Q2 ;
datalines;
23 16
Null  45
18 999
Null Null
44 22
;
run;

proc print data = example;
   format Q1 ExampleA. Q2 ExampleB.;
run;

If you have multiple variables that have the same use of null then you use the same informat/format pair. Each custom informat/format can have 26 "special" missing assignments .A to .Z

 

Super User
Posts: 17,819

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

how did you import your data? It may be worth fixing in that step.

Regular Contributor
Posts: 199

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

[ Edited ]

PROC IMPORT DATAFILE="/folders/dataset.xlsx"
DBMS=xlsx
OUT=have REPLACE;
SHEET = "Worksheet_1";
GETNAMES = yes;
RUN;

 

I can easily replace the "NULL" with "999" in the dataset, but I'm trying not to alter the data.

Super User
Posts: 17,819

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

Ok, it sounds like a general data cleaning process, not necessarily a null with 999 value.

 

I guess my next question - what's the difficulty? Multiple values? You can loop through with an array if required, but you're best off starting off with figuring out how to deal with each of your variables and then determine how to extend that solution.

 

Here's a link on data transformations and I'll post a short code snippet on recoding values using an array.

 

http://www.ats.ucla.edu/stat/sas/modules/vars.htm

 

data input2;
set have;

array var_origin(*) origin_var1-origin_var100;
array var_fix(*) var1-var100; /*need to list out variables, if you have a prefix this can be made straightforward*/


do i=1 to dim(var_origin);

if missing(var_origin(i)) then var_fix(i)=999;
end;

run;

 

 

Regular Contributor
Posts: 199

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

[ Edited ]

Thanks for your help Reeza...

 

I'm not sure I understand the purpose of the second array. Is it creating new variables with the new value, as opposed to changing the values in the old variable?

 

A few questions: 

1. Does the "*" have to be the number of variables?

2. Do I need a "$" after the "*" since these are character variables?

3. What is "dim"? Do I need to define the number of variables there?

4. Can I incorporate multiple lists in the variable list (e.g.  var1--var12, var19--var22, var30--var42). The array will only work for variables of the same type, correct? My character variables are interspersed with my numeric variables. I have been using PROC CONTENTS and then copying and pasting the character variables into the array list. Argh.

 

 

Super User
Posts: 17,819

Re: Replacing a character value "NULL" w/ a numeric value "999" in multiple vari

1. No, you can leave it as a generic asterisk or replace with the number of variables.
2. Yes
3. Dim() is a function that returns the length of an array or the number of variables.
4. Yes, they all need to be the same type. If the variables are all 'side by side' with numeric in between you can list them using: var1-character-varn. This will include all character variables between var1 and varn. You can find this under variable lists in the documentation.


Also, yes, the second array is creating new variables. Thats a bit a matter of choice I suppose.
Ask a Question
Discussion stats
  • 8 replies
  • 955 views
  • 5 likes
  • 3 in conversation