BookmarkSubscribeRSS Feed
PrinceAladdin
Calcite | Level 5

Dear SAS community,

 

I am trying to run an array to convert all blank values for all variables to "." However, when I try to run the array, I receive an error message indicating some of my numeric variables are actually character variables and thus the array cannot run. I am hypothesizing that these variables (which should be numeric) are considered character variables because the data collectors input a value of "N/A" for several observations instead of the intended 1-5 scale. If this is true, is there a way to (1) run an array to identify all non-numeric values and (2) convert those non-numeric values to "."? And will doing so automatically reclassify the intended numeric variable as a numeric variable? 

 

Thank you!

4 REPLIES 4
KachiM
Rhodochrosite | Level 12

Hi @PrinceAladdin,

 

If my understanding is right, here is an approach.

 

[1] Read into a string of all variables.

[2] Replace 'n/a' by '.'.

[3] Scan each variable so that '.' will also be read by SCAN function using 'S' option.

 

data have;
array a a1 - a5;
input ;
_infile_ = tranwrd(_infile_,'n/a', '.');
do i = 1 to dim(a);
   a[i] = scan(_infile_,i,' ', 's');
end;
drop i;
datalines;
1  2   n/a  2  1
2  1   .    1  2
3 n/a  n/a  2  1
;
run;

 

Edited:

 

To avoid the 


NOTE: Character values have been converted to numeric values at the places given by

 

on the LOG, replace the statement

 

a[i] = scan(_infile_,i,' ', 's');

By 

 

a[i] = input(scan(_infile_,i,' ', 's'),8.);


 

PaigeMiller
Diamond | Level 26

@PrinceAladdin wrote:

I am hypothesizing that these variables (which should be numeric) are considered character variables because the data collectors input a value of "N/A" for several observations instead of the intended 1-5 scale. If this is true, is there a way to (1) run an array to identify all non-numeric values and (2) convert those non-numeric values to "."? And will doing so automatically reclassify the intended numeric variable as a numeric variable? 


I think perhaps you have formulated the problem incorrectly. You don't want to identify all non-numeric "values", you want to identify all non-numeric variables, then create a numeric variable as its replacement, in which a non-numeric value becomes a missing value which is a dot (without quotes around it). You don't want to leave these variables as character variables because if you did, you would not be able to do any math (like calculate an average) on them.

 

First determine which variables are character, and then you can create replacement variables that don't have this issue.

 

To determine all the character variables in your data set, replace "libraryname" with the actual name of the library where this data set exists, and 'datasetname' is the actual name of your data set.

 

proc sql noprint;
    select distinct name into :names separated by ' ' from sashelp.vcolumn 
    where libname="libraryname" and memname='datasetname' and type='char';
quit;

Then, a macro creates corresponding numeric variables where the N/A or other character values are set to missing.

 

%macro create_num;
    data want;
         set have;
         %do i=1 %to %sysfunc(countw(&names));
             %let thisname=%scan(&names,&i,%str( ));
             length &thisname._N 8;
             &thisname._N = &thisname;
          %end;
     run;
%mend;
%create_num

 

--
Paige Miller
ballardw
Super User

@PrinceAladdin wrote:

Dear SAS community,

 

I am trying to run an array to convert all blank values for all variables to "." However, when I try to run the array, I receive an error message indicating some of my numeric variables are actually character variables and thus the array cannot run. I am hypothesizing that these variables (which should be numeric) are considered character variables because the data collectors input a value of "N/A" for several observations instead of the intended 1-5 scale. If this is true, is there a way to (1) run an array to identify all non-numeric values and (2) convert those non-numeric values to "."? And will doing so automatically reclassify the intended numeric variable as a numeric variable? 

 

Thank you!


Typically I would consider this a failure in how the data was originally read into SAS. If you know a variable is intended to be numeric then use an approach that reads the data into the correct type.

I read lots of files with a variety of poor data entry standards and have to create appropriate "missing" or recoded values depending upon the actual characters encountered. If you use a data step to read the data then SAS provides a very nice tool in the form of custom informats to read commonly occurring text into a desired value.

Example:

Proc format library=work;
invalue ynx (upcase)
'Y','YES' = 1
'N','NO'  = 0
'X'       = .x
' '       = .
other = _error_;
invalue lickert (upcase)
1,2,3,4,5 =_same_
'NA','N/A' = .n
' '        = .
other    =_error_
;

data example;
   infile datalines dlm=',' truncover;
   informat x ynx. y lickert. ;
   input x y;
datalines;
y,1
Yes,na
N,N/a
 ,5
Y,8
n, 
x,3
;
run;

This creates two custom formats one that expects values of Y or Yes, N or No, X, and blank. The second expects values of 1 to 5, integers, NA or N/A and blank. Any other value encountered will be treated as an invalid data value and provide a note in the log.

The UPCASE on the invalue statements says to make the encountered value upper case before comparing to the rules in the format. So in the first format Y, y, Yes, YEs, yES, yEs , YeS, yeS would all be considered valid values and treated as 1 on input.

The informats also use special missing to indicate that special input codes X or NA can be differentiated from a blank though the numeric value is missing and will not be used for any summary statistics. But you can determine between the two types of missing if needed.

The keywords _same_ means the value assigned is as read so you get numeric 1,2,3,4 and 5. The other=_error_ is the part that says any other value is an invalid value. This will cause the "invalid data" in the log and show the suspect line of data.

 

I use this because I have some sources that just plain refuse to pay attention to spelling and I have to constantly update the informats to accommodate other random spelling changes like inserting hyphens, extra spaces, reversing the order of word values and such.

 

I also have similar informats to list things like Site location codes. The data sources will either misspell the site names, renamed the site without notification or add new sites. So I get warnings when I read their data and can ask about appropriate behavior for the "new" site code encountered.

Tom
Super User Tom
Super User

How did you create the dataset? Why did you make the variables as character if you wanted them to be numeric?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1276 views
  • 1 like
  • 5 in conversation