02-15-2016 03:03 PM - edited 02-15-2016 03:46 PM
I have over 100 variables in my dataset, the vast majority of them contain both numeric and character values (i.e. "NULL"). When imported, the variables containing both numeric values and "NULL" are set as character variables. I need them to be numeric so that I can use them in calculations.
What is the best way to accomplish this?
I thought I would try to replace all the "NULL" values with "999", but I only know how to do this on a variable by variable basis. I am a relatively notice SAS user.
There must be a better way.
I am using SAS OnDemand for Academics
02-15-2016 03:13 PM
I would highly recommend against replacing the NULLs with 999. It becomes too likely to accidently use that number in calculations.
First, make sure your numeric variables are numeric and character are character.
Second, SAS handles missing values well, so I wouldn't worry about replacing them with anything.
SAS stores character missing as blank and numeric missing as .
02-15-2016 03:17 PM - edited 02-15-2016 03:45 PM
<Second, SAS handles missing values well, so I wouldn't worry about replacing them with anything.>
The values are not necessarily missing. "NULL" can mean different things in this dataset, depending on the context.
The 999 is going to be replaced, depending on the context. In some situations, it will be replaced with a missing value (i.e. "."). In others, it will be replaced with "0".
02-16-2016 10:38 AM
An expansiont on @Reeza and handling missing values. SAS provides way to indicate WHY a variable may be missing, if you know. Using a combination of custom informats and formats you could read the data into new numeric variables using a slight modification of Reeza's code.
Here is a brief example of using custom informats and formats
proc format library=work; invalue ExampleA (upcase) "NULL" = .A ; Value ExampleA .A = 'Not Entered'; ; invalue ExampleB (upcase) "NULL" = .A 999 = .B ; Value Exampleb .A = 'Refused to Answer' .B = 'Invalid response' ; data example; informat Q1 ExampleA.; informat Q2 ExampleB.; input Q1 Q2 ; datalines; 23 16 Null 45 18 999 Null Null 44 22 ; run; proc print data = example; format Q1 ExampleA. Q2 ExampleB.; run;
If you have multiple variables that have the same use of null then you use the same informat/format pair. Each custom informat/format can have 26 "special" missing assignments .A to .Z
02-15-2016 03:22 PM - edited 02-15-2016 03:22 PM
PROC IMPORT DATAFILE="/folders/dataset.xlsx"
SHEET = "Worksheet_1";
GETNAMES = yes;
I can easily replace the "NULL" with "999" in the dataset, but I'm trying not to alter the data.
02-15-2016 04:47 PM
Ok, it sounds like a general data cleaning process, not necessarily a null with 999 value.
I guess my next question - what's the difficulty? Multiple values? You can loop through with an array if required, but you're best off starting off with figuring out how to deal with each of your variables and then determine how to extend that solution.
Here's a link on data transformations and I'll post a short code snippet on recoding values using an array.
data input2; set have; array var_origin(*) origin_var1-origin_var100; array var_fix(*) var1-var100; /*need to list out variables, if you have a prefix this can be made straightforward*/ do i=1 to dim(var_origin); if missing(var_origin(i)) then var_fix(i)=999; end; run;
02-15-2016 05:42 PM - edited 02-15-2016 05:46 PM
Thanks for your help Reeza...
I'm not sure I understand the purpose of the second array. Is it creating new variables with the new value, as opposed to changing the values in the old variable?
A few questions:
1. Does the "*" have to be the number of variables?
2. Do I need a "$" after the "*" since these are character variables?
3. What is "dim"? Do I need to define the number of variables there?
4. Can I incorporate multiple lists in the variable list (e.g. var1--var12, var19--var22, var30--var42). The array will only work for variables of the same type, correct? My character variables are interspersed with my numeric variables. I have been using PROC CONTENTS and then copying and pasting the character variables into the array list. Argh.
02-15-2016 07:16 PM