BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I have over 100 variables in my dataset, the vast majority of them contain both numeric and character values (i.e. "NULL"). When imported, the variables containing both numeric values and "NULL" are set as character variables. I need them to be numeric so that I can use them in calculations. 

 

What is the best way to accomplish this?

 

I thought I would try to replace all the "NULL" values with "999", but I only know how to do this on a variable by variable basis. I am a relatively notice SAS user.

 

There must be a better way.

 

Thanks!

 

I am using SAS OnDemand for Academics

8 REPLIES 8
Reeza
Super User

I would highly recommend against replacing the NULLs with 999. It becomes too likely to accidently use that number in calculations.

 

First, make sure your numeric variables are numeric and character are character. 

Second, SAS handles missing values well, so I wouldn't worry about replacing them with anything. 

 

SAS stores character missing as blank and numeric missing as .

 

 

_maldini_
Barite | Level 11

<Second, SAS handles missing values well, so I wouldn't worry about replacing them with anything.>

 

The values are not necessarily missing. "NULL" can mean different things in this dataset, depending on the context. 

 

The 999 is going to be replaced, depending on the context. In some situations, it will be replaced with a missing value (i.e. "."). In others, it will be replaced with "0". 

ballardw
Super User

An expansiont on @Reeza and handling missing values. SAS provides way to indicate WHY a variable may be missing, if you know. Using a combination of custom informats and formats you could read the data into new numeric variables using a slight modification of Reeza's code.

 

Here is a brief example of using custom informats and formats

proc format library=work;
invalue ExampleA   (upcase)
"NULL" = .A
;
Value ExampleA
.A = 'Not Entered';
;
invalue ExampleB   (upcase)
"NULL" = .A
999    = .B
;
Value Exampleb
.A = 'Refused to Answer'
.B = 'Invalid response'
;

data example;
   informat Q1 ExampleA.;
   informat Q2 ExampleB.;
   input Q1 Q2 ;
datalines;
23 16
Null  45
18 999
Null Null
44 22
;
run;

proc print data = example;
   format Q1 ExampleA. Q2 ExampleB.;
run;

If you have multiple variables that have the same use of null then you use the same informat/format pair. Each custom informat/format can have 26 "special" missing assignments .A to .Z

 

Reeza
Super User

how did you import your data? It may be worth fixing in that step.

_maldini_
Barite | Level 11

PROC IMPORT DATAFILE="/folders/dataset.xlsx"
DBMS=xlsx
OUT=have REPLACE;
SHEET = "Worksheet_1";
GETNAMES = yes;
RUN;

 

I can easily replace the "NULL" with "999" in the dataset, but I'm trying not to alter the data.

Reeza
Super User

Ok, it sounds like a general data cleaning process, not necessarily a null with 999 value.

 

I guess my next question - what's the difficulty? Multiple values? You can loop through with an array if required, but you're best off starting off with figuring out how to deal with each of your variables and then determine how to extend that solution.

 

Here's a link on data transformations and I'll post a short code snippet on recoding values using an array.

 

http://www.ats.ucla.edu/stat/sas/modules/vars.htm

 

data input2;
set have;

array var_origin(*) origin_var1-origin_var100;
array var_fix(*) var1-var100; /*need to list out variables, if you have a prefix this can be made straightforward*/


do i=1 to dim(var_origin);

if missing(var_origin(i)) then var_fix(i)=999;
end;

run;

 

 

_maldini_
Barite | Level 11

Thanks for your help Reeza...

 

I'm not sure I understand the purpose of the second array. Is it creating new variables with the new value, as opposed to changing the values in the old variable?

 

A few questions: 

1. Does the "*" have to be the number of variables?

2. Do I need a "$" after the "*" since these are character variables?

3. What is "dim"? Do I need to define the number of variables there?

4. Can I incorporate multiple lists in the variable list (e.g.  var1--var12, var19--var22, var30--var42). The array will only work for variables of the same type, correct? My character variables are interspersed with my numeric variables. I have been using PROC CONTENTS and then copying and pasting the character variables into the array list. Argh.

 

 

Reeza
Super User
1. No, you can leave it as a generic asterisk or replace with the number of variables.
2. Yes
3. Dim() is a function that returns the length of an array or the number of variables.
4. Yes, they all need to be the same type. If the variables are all 'side by side' with numeric in between you can list them using: var1-character-varn. This will include all character variables between var1 and varn. You can find this under variable lists in the documentation.


Also, yes, the second array is creating new variables. Thats a bit a matter of choice I suppose.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 7209 views
  • 5 likes
  • 3 in conversation