In the 30 years of this company' existence the variable pat_name in a
variety of data sets on all platforms has always been defined as a character.
It has never,
ever,
ever,
ever,
ever,
ever,
ever,
ever,
ever,
been nothing but a character string on ANY platform, going back to the 90's.
No data platform - ORACLE PL/SQL, EXCEL, MSACESS, SQL Server has ever interpreted the variable pat_name as anything but a character or string variable.
Attempting to parse a first, last name from pat_name using the SAS scan function I get this message:
"Variable pat_name has been defined as both character and numeric."
This is flat out NOT true.
Trying to play along,
I wrote the SAS code below to assign a new variable labeled "full_name"
to a new data set "nu_name" ,
and giving it the contents of pat_name,
explicitly stating it is a CHARACTER variable.
data nu_name ;
length full_name $55.; < formats the variable "full_name" as a CHARACTER variable.
full_name = pat_name;
set patient_tbl;
run;
Use the scan function to parse the last name from the contents of full_name,
just like I attempted with pat_name.
data have_not;
length last_name $45.;
last_name=scan(full_name,1,',');
set nu_name ;
run;
Log results:
"ERROR: Variable full_name has been defined as both character and numeric."
This is impossible, I just defined full_name as a CHARACTER variable in the previous data step -
exactly the same as pat_name was.
SAS execute (and compiles) the statements in the order you write them. It will set the type (and length) of the variable at the first place it needs to. If there is no information available to indicate it should make the variable as character it will be created as numeric. It there is no information to guess a different length it will use 8 as the storage length.
This code :
data nu_name ;
length full_name $55.;
full_name = pat_name;
*set patient_tbl;
run;
Will define PAT_NAME as character with a length of 8.
Alphabetic List of Variables and Attributes # Variable Type Len 1 full_name Char 55 2 pat_name Char 8
It is smart enough to know that the right side of the assignment statement should be a character expression but it has no information to choose a length other than the default of 8 bytes.
Placing the assignment statement before the SET statement will also cause strange behavior. If PAT_NAME did exist on the input dataset then you will see. (1) The values will be truncated to the first 8 bytes. (2) Assuming that FULL_NAME is not on the input dataset then the value of FULL_NAME will have the value of PAT_NAME from the previous observation. You have essentially lagged the value of PAT_NAME into the variable FULL_NAME.
If you want to see how PAT_NAME is defined in a dataset patient_tbl
run PROC CONTENTS on the dataset and check it.
If PAT_NAME was numeric your data step will fail since it cannot be both.
If you reverse the order of the statements so that the compiler first sees the SET statement and then the assignment statement then the data step will run even if PAT_NAME is numeric. But you should see notes that numeric values were converted to character values in the assignment statement.
The usually sources of variables becoming numeric.
1) The variable is used before it is defined in a data step (similar to your example data step) and the usage causes SAS to create it as numeric.
2) The variable is created by PROC IMPORT and because of the values that PROC IMPORT happened to check when guessing how to define the variable from this particular input file made it look like it should be numeric.
data have_not;
length last_name $45.; /* at this point, the data step compiler defines last_name as character in the PDV */
last_name=scan(full_name,1,','); /* since full_name has not yet been defined in any way, the data step compiler makes it numeric (default) */
set nu_name ; /* now the data step compiler sees the variable full_name as character, so this causes the collision */
run;
If you want to carry over the value from the previous observation (which your code will do), use the LAG function:
data have_not;
length last_name $45.;
set nu_name ;
last_name=scan(lag(full_name),1,',');
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.