BookmarkSubscribeRSS Feed
leehsin
Quartz | Level 8

Hi, I had an issue with variable attribution in resulted SAS data set when using proc import to read in excel files. There are two batches of excel files which have exactly the same data structure (first row is the name of variable, the names are the same, and all rows below the name row store numeric data), but downloaded at different dates. I used the same code (a macro code, see below) to import them into SAS. I found that the earlier batch excels all came out with numeric variable in SAS data, but the 2nd batch excel files gave me those variable as characters.

 

There is nothing else can be accounted for these two different results, the only thing which is different is the dates of downloading them. Has anyone had this type of problem?

 

Two questions:

1) Is it possible that the name row has something invisible changed ? (visually from the excel file, the name row has nothing changed). 

2) How to fix the problem in the code (I mean: how to ensure the attribution of variables I import from the Excel files to be numeric) ?

 

Thanks a lot!

 

This is the macro to import excel files:

%macro Excel_Import(location, filename, library, out);

    proc import datafile="&location.\&filename"
                DBMS= xlsx REPLACE
                out= &library..&out;
                getnames=yes;

    run;

%mend Excel_Import;
24 REPLIES 24
Tom
Super User Tom
Super User

In an Excel file each cell can contain a different type of data than any other cell. In a SAS dataset (or any database) each variable (column) can only have a single data type.  So if ANY cell in the column does not contain numeric data then the variable will be defined as numeric character. Either the new file has extra row with garbage or missing values. Or has a lot more missing values that caused PROC IMPORT to consider every column as being of mixed data types.

 

If you want control over how the variables are defined the best solution is to use a text file as the transfer format. Then you can write a data step to read the text file and have complete control over the file.

leehsin
Quartz | Level 8

There is no issue with the data quality as these data are well prepared to be used by the end users (no missing values, etc.). It surprised me because this time all the files (5 excel files) came out the same result (variables as character), and all the files from last time (5 excel files as well) came out as the numeric variables. 

 

The data set has columns for first name, last name, ID number, date of birth, and all other columns containing numbers, cost, etc. The resulting SAS data set has all the columns as characters for this time, but for last time only two columns (first name and last name) were of characters. 

 

Is there a good way to control variable attributions to be consistent every time?

Reeza
Super User
There are no mechanisms to control types and attributes when importing from Excel. If you have a CSV or different file types you can explicitly control the types.

The reason for this, is that Excel does not restrict types by any format so it's difficult to enforce that. If you're positive your data is well structured then you can export the data to a CSV and read the CSVs in a consistent manner. However, if the Excel file is not consistent in it's types and formats then it will cause issues with the CSV as well.
leehsin
Quartz | Level 8

Thanks to all the replies!

 

I realized that proc import can not control the types when importing an excel file. I tested and found that, if the variable type imported from an excel file for a column containing numbers is character, then convert the excel file to .cvs file and do importing again will not make the the variable type to numeric. I also read many other previous posts discussing the similar issues, solving this problem is not a simple one. Maybe turning the direction to the post-import step is a way to do.

 

I saw there are discussions about using dbSasType option in the data set steps. It fits my case. However, I tried this solution, but still not succeeded. The data I imported from an excel file containing the variable 'Ben_ID' which is character type. I want it to be numeric type. Here is the code:

 

data ben;
	set Car_cat_124317069_18
		(dbSasType=(Ben_ID=numeric));
run;

SAS log:

395
396  data ben;
397      set Car_cat_124317069_18
398          (dbSasType=(Ben_ID=numeric));
              ---------
              22
ERROR 22-7: Invalid option name DBSASTYPE.

399  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.BEN may be incomplete.  When this step was stopped there were
         0 observations and 0 variables.
WARNING: Data set WORK.BEN was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

 

What is the reason I got this error and how to correct the code?

 

Many thanks!

Tom
Super User Tom
Super User

@leehsin wrote:

Thanks to all the replies!

 

I realized that proc import can not control the types when importing an excel file. I tested and found that, if the variable type imported from an excel file for a column containing numbers is character, then convert the excel file to .cvs file and do importing again will not make the the variable type to numeric. I also read many other previous posts discussing the similar issues, solving this problem is not a simple one. Maybe turning the direction to the post-import step is a way to do.

 

I saw there are discussions about using dbSasType option in the data set steps. It fits my case. However, I tried this solution, but still not succeeded. The data I imported from an excel file containing the variable 'Ben_ID' which is character type. I want it to be numeric type. Here is the code:

 

data ben;
	set Car_cat_124317069_18
		(dbSasType=(Ben_ID=numeric));
run;

SAS log:

395
396  data ben;
397      set Car_cat_124317069_18
398          (dbSasType=(Ben_ID=numeric));
              ---------
              22
ERROR 22-7: Invalid option name DBSASTYPE.

399  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.BEN may be incomplete.  When this step was stopped there were
         0 observations and 0 variables.
WARNING: Data set WORK.BEN was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

 

What is the reason I got this error and how to correct the code?

 

Many thanks!


DBSASTYPE is a feature of some of the SAS/Access to database products.  I have seen people use it with Excel files by using ODBC connection to (not sure what) that actually reads the Excel file.  Then since you are using SAS/Access to ODBC you can try to use DBSASTYPE.  You can also use explicit pass thru SQL code to convert the type BEFORE it is read into SAS.

 

If PROC IMPORT from a CSV generated for that field also did not create a numeric variable then you KNOW that your data is NOT numeric. There is at least one value in that column that SAS cannot figure out how to convert into a number.  If you cannot see it then perhaps it is an invisible character such as a tab or carriage return or 'A0'X (what Windows calls a non-breaking space).  Or it could as simple as someone decided use R style for missing and put the text NA into cells that should have been left empty.

Reeza
Super User

I realized that proc import can not control the types when importing an excel file. I tested and found that, if the variable type imported from an excel file for a column containing numbers is character, then convert the excel file to .cvs file and do importing again will not make the the variable type to numeric.

When the file is CSV you can control the types and formats. But if SAS is still reading it in as a character there's something in your data it thinks is character.

 

You can customize your import from a CSV via a data step to ensure it reads in correctly. If you're not sure where to start, get the code from the log AFTER running a proc import for the CSV file and start with that/

leehsin
Quartz | Level 8
Thank you, Reeza! I opt to do the transformation after data are imported into SAS. Now I have two sets of importing code, one is used for importing excel files which results in numeric values, and another one is used for importing excel files which results in character values. As I can't control the outcome of executing 'proc import', I will do manual examination each time at the step after excel files are imported. Though my programs were designed to run the process with minimum stops, now it seems that it is not convenient and safe to pass this step automatically. Or I need write another conditional code after initial 'proc import' to decide if I need to do importing using another set of 'proc import' procedures based on the variable types of the resulted imported data.
Reeza
Super User

@leehsin wrote:
Thank you, Reeza! I opt to do the transformation after data are imported into SAS. Now I have two sets of importing code, one is used for importing excel files which results in numeric values, and another one is used for importing excel files which results in character values. As I can't control the outcome of executing 'proc import', I will do manual examination each time at the step after excel files are imported. Though my programs were designed to run the process with minimum stops, now it seems that it is not convenient and safe to pass this step automatically. Or I need write another conditional code after initial 'proc import' to decide if I need to do importing using another set of 'proc import' procedures based on the variable types of the resulted imported data.

And all of this is avoided and you can streamline it, if you convert to CSV (which can be done via batch or automated) and use a data step. You can choose to do it manually each time, but its definitely not necessary and not a SAS limitation by any means.

 

 

leehsin
Quartz | Level 8
Reeza, thanks for giving me the suggestions. You meant that the excel files being converted to CSV files can be done via batch in SAS or in some ways? I'd like to hear more about it. Thank you!
Reeza
Super User

https://gist.github.com/statgeek/878e585102c14e01581f55dbe972d27e

 

/*This program will take a folder path (default) and look for files
that are specifically with the specified extension (ext) and convert
the files to CSV. It uses XCMD so you need to have that option enabled. 
The original script is from here:
http://support.sas.com/kb/43/496.html
Modified to convert XLSX to CSV. 
Author:F. Khurshed
Date: 2018-03-23
*/


options noxwait noxsync; 
 
%macro convert_files(default=,ext=);
 
data _null_;
file "'&default\temp.vbs'";
put "set xlapp = CreateObject(""Excel.Application"")";
put "set fso = CreateObject(""scripting.filesystemobject"")";
put "set myfolder = fso.GetFolder(""&default"")";
put "set myfiles = myfolder.Files";
put "xlapp.DisplayAlerts = False";
put " ";
put "for each f in myfiles";
put "  ExtName = fso.GetExtensionName(f)";
put "  Filename= fso.GetBaseName(f)";
put "    if ExtName=""&ext"" then";
put "           set mybook = xlapp.Workbooks.Open(f.Path)"; 
put "           xlapp.Visible = false";
put "           mybook.SaveAs ""&default.\"" & Filename & "".csv"", 6";
put "    End If";
put "  Next";
put "  mybook.Close";
put "  xlapp.DisplayAlerts = True";
/* Removes original files */
/*put " FSO.DeleteFile(""&default\*.&ext""), DeleteReadOnly";*/
put " xlapp.Quit";
put " Set xlapp = Nothing";
put " strScript = Wscript.ScriptFullName";
put " FSO.DeleteFile(strScript)"; 
run; 
 
x "cscript ""&default\temp.vbs""";
 
%mend;
 
leehsin
Quartz | Level 8

Thanks for the macro. I will try it in the next few days. One thing I need to mention is that after I manually save the .xlsx files as .csv, and do the importing, those value of numbers were imported as numeric value which is as what I want, but there are cost values which are in the currency format in .xlsx file, and remained the same format in .csv file. The values are still imported as string characters showing the '$' sign and ','. Maybe the mcaro could be modified to solve this issue as well.

Tom
Super User Tom
Super User

@leehsin wrote:

Thanks for the macro. I will try it in the next few days. One thing I need to mention is that after I manually save the .xlsx files as .csv, and do the importing, those value of numbers were imported as numeric value which is as what I want, but there are cost values which are in the currency format in .xlsx file, and remained the same format in .csv file. The values are still imported as string characters showing the '$' sign and ','. Maybe the mcaro could be modified to solve this issue as well.


If the variable names and order do not change from file to file there is no need to use PROC IMPORT to read a simple text file.  Having to deal with special informats is a good example of why it is easier to just use a data step to read the CSV file.

data want;
  length col1 $20 col2 8 col3 8 col4 $50 ;
  informat col2 comma.;
  format col2 dollar14.2 ;
  infile 'newfile.csv' dsd truncover firstobs=2;
  input col1 -- col4;
run;
leehsin
Quartz | Level 8
Got it. Thank you!
leehsin
Quartz | Level 8

Hi, Reeza! 

 

I got a chance today to test the macro you provided in SAS EG. However, it did not go through. Here is the the I used to macro:

%xlsx_to_csv(default= /u02/projects/framework/working/2019q2/dev
			,ext= xlsx
			);

The error is:

  
25         GOPTIONS ACCESSIBLE;
WARNING: Apparent invocation of macro XLSX_TO_CSV not resolved.
26         %xlsx_to_csv(default= /u02/projects/framework/working/2019q2/dev
           _
           180
ERROR 180-322: Statement is not valid or it is used out of proper order.

27         			,ext= xlsx
28         			);

29         
30         

Any suggestions?

 

Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 6525 views
  • 2 likes
  • 6 in conversation