Ignore the data type messages. That is just PROC IMPORT being stupid. SAS only supports two data types. Fixed length character strings and floating point numbers. So PROC IMPORT is saying that it is internally treating the cells in the XSLX file as variable length strings and is saying it cannot create variable length string variables in a SAS dataset. Why it feels a need to tell you that makes no sense at all.
I do not see any issue with your cleaned up first row when reading into SAS dataset.
4 proc import datafile="&path/&fname" dbms=xlsx out=test1 replace; 5 run; NOTE: Variable Name Change. Full Name Associated Case Indivi -> Full_Name_Associated_Case_Indivi NOTE: Variable Name Change. Address3 State Province Associat -> Address3_State_Province_Associat NOTE: Variable Name Change. Hand-off Outcome Marian -> Hand_off_Outcome_Marian NOTE: Variable Name Change. Resources needed -> Resources_needed NOTE: Variable Name Change. Department Regarding Individual -> Department_Regarding_Individual NOTE: Variable Name Change. Marian ID -> Marian_ID NOTE: Variable Name Change. Individual ID Regarding Individu -> Individual_ID_Regarding_Individu NOTE: Variable Name Change. Part of Household Regarding Indi -> Part_of_Household_Regarding_Indi NOTE: Variable Name Change. Diagnosis Date time Associated C -> Diagnosis_Date_time_Associated_C NOTE: Variable Name Change. Activity Status -> Activity_Status NOTE: Variable Name Change. Status Reason -> Status_Reason NOTE: Variable Name Change. Date Created -> Date_Created NOTE: Variable Name Change. Person is a minor -> Person_is_a_minor NOTE: Variable Name Change. Reached Completed all attempts -> Reached_Completed_all_attempts NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 6 observations and 15 variables. NOTE: WORK.TEST1 data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.04 seconds cpu time 0.03 seconds
However your last row, which I take to be the actual column headers from your XLSX file are not handled as well.
13 proc import datafile="&path/&fname" dbms=xlsx out=test2 replace; 14 range='$A7:'; 15 run; NOTE: Variable Name Change. Full Name (Associated Case) (In -> VAR1 NOTE: Variable Name Change. Address3: State/Province (Associ -> VAR2 NOTE: Variable Name Change. Hand-off Outcome (Marian) -> VAR3 NOTE: Variable Name Change. Resource(s) needed? -> VAR4 NOTE: Variable Name Change. Department (Regarding) (Individu -> VAR5 NOTE: Variable Name Change. Marian ID -> Marian_ID NOTE: Variable Name Change. Individual ID (Regarding) (Indiv -> VAR7 NOTE: Variable Name Change. Part of Household (Regarding) (I -> VAR8 NOTE: Variable Name Change. Diagnosis Date time (Associated -> Diagnosis_Date_time__Associated NOTE: Variable Name Change. Activity Status -> Activity_Status NOTE: Variable Name Change. Status Reason -> Status_Reason NOTE: Variable Name Change. Date Created -> Date_Created NOTE: Variable Name Change. Person is a minor -> Person_is_a_minor NOTE: Variable Name Change. Reached / Completed all attempts -> Reached___Completed_all_attempts NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 0 observations and 15 variables. NOTE: WORK.TEST2 data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.02 seconds cpu time 0.01 seconds
There are some patterns here. It does not like the parentheses. It seems to handle the / ok. Not sure about the colon. Or the leading space.
You could use VALIDVARNAME=ANY and then it does not seem to transform those names, but then they are really not very usable in code.
Obs _NAME_ _LABEL_ 1 Full Name (Associated Case) (In Full Name (Associated Case) (Individual) 2 Address3: State/Province (Associ Address3: State/Province (Associated Case) (Individual) 3 Hand-off Outcome (Marian) Hand-off Outcome (Marian) 4 Resource(s) needed? Resource(s) needed? 5 Department (Regarding) (Individu Department (Regarding) (Individual) 6 Marian ID Marian ID 7 Individual ID (Regarding) (Indiv Individual ID (Regarding) (Individual) 8 Part of Household (Regarding) (I Part of Household (Regarding) (Individual) 9 Diagnosis Date time (Associated Diagnosis Date time (Associated Case) (Individual) 10 Activity Status Activity Status 11 Status Reason Status Reason 12 Date Created Date Created 13 Person is a minor Person is a minor 14 Reached / Completed all attempts Reached / Completed all attempts ? 15 Birthdate Birthdate
You can read the data without the headers by starting in the first cell of the second row.
proc import datafile="&path/&fname" dbms=xlsx out=want replace;
getnames=no;
range='$A2:';
run;
If want to dynamically read the first row and use those strings as labels for the variables that is the easiest solution.
proc import datafile="&path/&fname" dbms=xlsx out=names_wide replace;
getnames=no;
run;
proc transpose data=names_wide(obs=1) out=names ;
var _all_;
run;
filename code temp;
data _null_;
set names end=eof;
if _n_=1 then put 'label';
put @3 _name_ '=' col1 :$quote. ;
if eof then put ';' ;
run;
proc datasets lib=work nolist;
modify want;
%include code / source2;
run;
quit;
You could make more compatible names from the strings in the first row than what PROC IMPORT does. For example by converting all of the special characters into underscores. Personally I like to first convert them to space and then use COMPBL() to collapse multiple spaces into a single space.
So either take the COL1 generated by transposing the first row or the _LABEL_ variable that PROC IMPORT will create when you use VALIDVARNAME=ANY and transform that into a new name for the variable that does not contain any special characters or spaces.
data clean_names;
set names ;
length clean_name $32 ;
clean_name=translate(strip(compbl(translate(col1,' ',compress(col1,compress(col1,'_','kad'))))),'_',' ');
run;
Obs _NAME_ _LABEL_ COL1 clean_name 1 A A Full Name (Associated Case) (Individual) Full_Name_Associated_Case_Indivi 2 B B Address3: State/Province (Associated Case) (Individual) Address3_State_Province_Associat 3 C C Hand-off Outcome (Marian) Hand_off_Outcome_Marian 4 D D Resource(s) needed? Resource_s_needed 5 E E Department (Regarding) (Individual) Department_Regarding_Individual 6 F F Marian ID Marian_ID 7 G G Individual ID (Regarding) (Individual) Individual_ID_Regarding_Individu 8 H H Part of Household (Regarding) (Individual) Part_of_Household_Regarding_Indi 9 I I Diagnosis Date time (Associated Case) (Individual) Diagnosis_Date_time_Associated_C 10 J J Activity Status Activity_Status 11 K K Status Reason Status_Reason 12 L L Date Created Date_Created 13 M M Person is a minor Person_is_a_minor 14 N N Reached / Completed all attempts ? Reached_Completed_all_attempts 15 O O Birthdate Birthdate
But personally if you really just have these 15 variables I would read the data without the names and use RENAME to change the default column names that are both reasonable short and sufficiently meaningful.
rename
A =Full_Name
B =Address3
C =Hand_off_Outcome
D =Resources_needed
E =Department
F =Marian_ID
G =Individual_ID
H =Part_of_Household
I =Diagnosis_Datetime
J =Activity_Status
K =Status_Reason
L =Date_Created
M =Minor
N =Reached_Completed_all_attempts
O =Birthdate
;
@dbjosiah wrote:
Thanks everyone. I'm guessing from the array of answers that there's no easy fix here. In response to your questions:
3. Tom & Ballardw: I'm using a regular .xlsx file with all the normal conventions listed by Ballardw. I currently do the renames in the next data step. One of the issues is that the file structure *is* likely to change at some unforeseen and unannounced point (out of my control).
If a file structure is going to change, especially without warning Proc Import is not to be recommended in any way, at least IMHO. I have some such data sources. I use a data step to read a CSV version because then when I specify INFORMATS for variables that don't match the expected values I get invalid data messages and I can then work with the file and data step code to read the new layout.
Some of these sources just randomly switch order of columns in the Excel. Sometimes they manage to set variables like Income as Dates, or account identifiers and dates as Currency and sometimes just for some rows of the file.
EXCEL is a crappy data interchange file format because it places no restrictions on what you can place where and then you have to clean up other's poor decisions.
One of the advantages of a data step is you get to set the variable names a do not rely on the actual text in the file. Then you can assign labels to them that can use much more text to describe the variables when used by most procedures.
I do not have any trouble reading that name. Here is test program.
filename out temp;
data test;
name='Reporting Organization (Regarding';
output;
name='123';
output;
run;
proc export data=test replace dbms=xlsx file=out;
putnames=no;
run;
proc import datafile=out dbms=xlsx out=test2 replace;
run;
Please share an example of your XLSX file that is having trouble with the names. You can remove the data lines (or just replace them with a single line of dummy data).
Even with VALIDVARNAME=ANY, the technical limit of 32 for the length of SAS variable names is still there. Save to csv, read with a data step, use sensible variable names, and keep the original Excel column names as labels.
Thanks everyone. I meant to mention before that *I* did not come up with those original variable names... I'll go the csv route and use the validvarname=any option.
Update: that solution is working just fine, without validvarname. I had to add GUESSINGROWS=MAX, and because the csv import didn't create labels like the xlsx import I found this neat little renaming code: PROC DATASETS; modify dsnname; attrib _all_ label='Original variable'; run; quit;. Then I can add better labels to the ones I care about, which I had to do before anyway. So I'll mark this discussion as Solved.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.