One of the first skills that new SAS programmers learn is how to read and convert raw character data into meaningful numeric data that can be used for calculations and statistics. If you're new to this topic, we recommend this short video tutorial for an explanation and demo.
Base SAS supports just two primitive types: character (or strings) and numeric. Numeric data is used to represent raw numbers, date values, datetime values, currency data, and more.
To convert a character value to a number, you use the INPUT function with a specified informat, which indicates how you want SAS to read the number.
The INPUT function looks like this:
new_var = input(original_var, informat.);
where informat is the name of the SAS informat used to interpret the value. You can find a list of built-in informats in the SAS documentation.
For example, if you have a simple string of digits like 12345678, you can use the basic numeric informat w.d:
data new;
char_var = '12345678';
numeric_var = input(char_var, 8.);
put numeric_var=;
run;
The output:
The w value (width) in w.d must be large enough to include the character length of the largest value to read (including decimal separator). (The d value is optional.) The w.d informat is flexible enough to interpret decimal values as well as scientific notation. Example:
data new;
char1 = '12345678';
char2 = '123.456';
char3 = '123e-4';
num1 = input(char1, 8.);
num2 = input(char2, 8.);
num3 = input(char3, 8.);
put num1= num2= num3=;
run;
Output for the new variables:
num1=12345678 num2=123.456 num3=0.0123
If the character value contains a thousands separator (usually a comma), use the COMMAw.d informat. If you want to adjust the display format of the output variable, use the FORMAT statement. For example, this example reads a character into a number variable, and then uses the COMMA9. format to display it in its original form. The new variable is represented as a numeric.
data new;
char1 = '1,234,567';
num1 = input(char1, comma9.);
/* read again, but this time apply a COMMA9. format for display */
num1_fmt = input(char1, comma9.);
format num1_fmt comma9.;
put num1= num1_fmt=;
run;
Output:
num1=1234567 num1_fmt=1,234,567
The COMMAw.d informat is more versatile than its name implies. The COMMAw.d informat removes (not only) embedded commas, but also blank spaces, dollar signs, percent signs, hyphens, and close parentheses from the input data. The COMMAw.d informat converts an open parenthesis at the beginning of a field to a minus sign to interpret as a negative value.
In this example, the COMMA12. informat is used to convert several different styles of number expressions to SAS numeric values:
data raw;
length raw_val $ 12;
infile datalines;
input raw_val;
datalines;
123456
1234.56
1,234.56
$1,234.56
(1,234.56)
-1234.56
;
run;
data convert;
set raw;
length num 8;
num = input(raw_val,comma12.);
run;
Output:
Note: for European style conventions (where comma is a decimal separator and the dot is a thousands separator), use the COMMAXw.d informat.
If you know that some of your raw character values won't convert cleanly to numbers, you can suppress the NOTE lines in your SAS log by using the ? or ?? modifier on the informat. Any non-number values will result in missing values in your output.
data raw;
length subj $ 10;
infile datalines;
input subj;
datalines;
140
172
unknown
275
;
run;
data weights;
set raw;
weight = input(subj,?6.);
run;
Output:
Without the ? modifier on the INPUT function, the SAS log would include a note similar to this:
NOTE: Invalid argument to function INPUT at line 42 column 11.
subj=unknown weight=. _ERROR_=1 _N_=3
A SAS date is a numeric value that is valid for use with date functions and other mathematical operations. A SAS date might be formatted so that it contains characters in its display, but a SAS date is always stored as a number. Internally, SAS date is the number of days since January 1, 1960. Similarly, a SAS datetime is a number -- the number of seconds since midnight on January 1, 1960.
To convert a character value to the date value it represents, use in the INPUT function with one of the many date informats. This is example shows two common date formats: ddMONyyyy (or "DATE9"), and MM-DD-YYYY (or MMDDYY10.):
data dates;
startdate = "12JUL2021";
enddate = "07-30-2022";
date_start = input(startdate,date9.);
date_end = input(enddate,mmddyy10.);
days_diff = date_end-date_start;
format date_start date9. date_end date9.;
run;
Tip: use the ANYDTDTE. informat to interpret a variety of date representations. See One informat to rule them all: Read any date into SAS. This program produces the same result as above:
data dates;
startdate = "12JUL2021";
enddate = "07-30-2022";
date_start = input(startdate,anydtdte12.);
date_end = input(enddate,anydtdte12.);
days_diff = date_end-date_start;
format date_start date9. date_end date9.;
run;
This technique helps with most common datetime representations as well, including those in a variety of ISO-standard representations, such as might be stored in a database or returned from an API:
data dt;
raw='2022-11-23T17:32:35Z';
val = input(raw,anydtdtm20.);
format val datetime20.;
run;
Output:
To convert a numeric variable to a character, use the PUT function with the desired format:
new_variable = put(original_variable, format.);
Note that the length of the new variable must be large enough to store the new value. This example stores the current datetime value and displays it in a new character variable.
data dt;
length dt 8 dt_char $ 20;
dt = datetime();
dt_char = put(dt, datetime20.);
run;
If your data values require leading zeros as a significant component of the display, use the Zw.d format to ensure the leading zeros are included. A common use case is postal (ZIP) codes in the US:
data raw;
length city $ 20 zip_num 8;
infile datalines;
input city zip_num;
datalines;
Williamsville 14221
Raleigh 27613
Boston 02134
;
run;
data better;
set raw;
/* always 5-digits inc any leading zeroes */
zip_char = put(zip_num,z5.);
run;
Output:
One thing that I find confuses new SAS users is how to convert a numeric value to a SAS date or datetime. The PUT function writes a character value and the INPUT function reads a character value, so it can’t be done with one function. My standard approach is to copy the numeric value to a character value with the PUT function and then convert the character value to a SAS date or datetime with the INPUT function using a date or datetime informat. I’ll illustrate with a SAS date.
data one;
num1= 19990102; /* date info as numeric */
/* First convert the numeric variable to a character variable with the PUT function.
Then, convert the character variable to a SAS date value with the INPUT function. */
tempchar = put (num1, 8.) ;
sasdate1 = input(tempchar, yymmdd8.) ;
/* This statement is equivalent to the previous two statements.
It avoids creating the variable TEMPCHAR but is somewhat less readable. */
sasdate1=input(put(num1,8.),yymmdd8.) ;
run;
Bruce Gilsen
speaking only for myself (and my Bruce Force fantasy baseball team, which will try for its 6th title in 34 years in 2023)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.