BookmarkSubscribeRSS Feed
Dhana18
Obsidian | Level 7

Hi,

I am importing an excel file with text variables but they changed to numeric after i imported into SAS.  What did i do wrong? 

iused this 

Proc import datafile='L:\Bcdp_Std\EPI\DATA\SURRG\Copy of 2017_2018_SURRG_CaseData_Tables_From_Chris_01282019.xlsx' out=SUBMTWO.Casedata dbms=xlsx;
GETNAMES=YES;
run;

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Excel = poor data source

Proc import = guessing procedure

Outcome = bad import

 

What is happening is proc import scans the data, sees that it looks numeric and applies numeric datatype.  You can search for guessing rows, but if its all numeric looking, that wont work either.  Best bet, save as CSV file, then write a proper datastep to read in the CSV and apply the formats/lengths that you know (from your import agreement) are correct.  Don't leave it to guess.

ballardw
Super User

Proc import has to "guess" what variable types are when connecting with Excel. The way the engines work if the first few rows of data for variables that should be character have values that look numeric then that is what is guessed. Part of this is due to the fact that columns have no actual properties in spreadsheets. You may have better results by saving the spreadsheet to a CSV file and use proc import to read that file. There is an option when reading delimited files to examine more rows before deciding the variable type and length.

 

If you save the CSV to the same folder using the File>Save As menu in Excel the program would look something like:

Proc import datafile='L:\Bcdp_Std\EPI\DATA\SURRG\Copy of 2017_2018_SURRG_CaseData_Tables_From_Chris_01282019.csv' 
   out=SUBMTWO.Casedata dbms=dlm
   ;
   guessingrows=10000;
run;

If you are going to be reading multiple files in the same layout then something to consider is that the above will generate a data step program to read the csv. You can copy that program from the log into you editor and save it. Then you only need to change the infile and data set name to read another file.

 

You can also examine the INFORMAT statements and modify them as needed for future use. Example is if you have a variable that other documentation could contain a character value up to 25 characters long but the code proc import generates only assign 18 because that was the length of the longest value actually encountered in data you could (actually should) modify the informat, and format, statement for that variable to $25 instead of $18.

Also a variable that had no values for some reason in this file would have a $1. informat assigned. You should assign the proper type and length for such variables.

 

That step is fairly important as the next common issue with imported spreadsheet data is issues with combining data sets and mismatched variable lengths resulting in truncated data.

 

Dhana18
Obsidian | Level 7

Hi Thank you for answering my question. I tried saving in CSV and used the import code but this is what happened.

The output

The log

 

NOTE: SUBMTWO.CLINICDATAKA data set was successfully created.

NOTE: The data set SUBMTWO.CLINICDATAKA has 3414 observations and 1 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           1.28 seconds

      cpu time            1.00 seconds

ballardw
Super User

@Dhana18 wrote:

Hi Thank you for answering my question. I tried saving in CSV and used the import code but this is what happened.

The output

The log

 

NOTE: SUBMTWO.CLINICDATAKA data set was successfully created.

NOTE: The data set SUBMTWO.CLINICDATAKA has 3414 observations and 1 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           1.28 seconds

      cpu time            1.00 seconds


Show your code and the entire log. Also copy a few lines from the CSV file using Notepad, Wordpad or a similar next editor-not a spreadsheet program. Paste them into a code box so we can see what you are actually attempting to read. Using a spreadsheet to copy the data will reformat the values of a CSV, and sometimes actually change them, so use something that reads the file as text.

The 1 variable note often means that the file being read with a delimited file actually contains no delimiters or a different one than your code uses. Which is why we would like to see the code actually used to read and example data.

 

Here is what one of my csv files that only contains numeric values looks like:

m0,m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,s0,s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16,c,yr,b
0,0,13,13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
7,0,0,13,13,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,13,1,13
13,7,0,0,13,13,0,0,0,0,0,0,0,0,0,0,0,13,6,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,26,2,26

The first row are column headings or variable names and then rows of data.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its very hard for us to guess what you are doing.  Show your code, show the file you are importing.  At a guess, did you name the file .csv?  So that the proc import can guess its a csv file?  Did you try supplying the delimiter value, and make sure it matches the delimiter in the datafile?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 592 views
  • 3 likes
  • 3 in conversation