BookmarkSubscribeRSS Feed
Amalik
Calcite | Level 5

Hi ,

 

I am trying to import a file in SAS, when I use excel verson to import, it imports the variables as characters although they are all numeric and when I import it as a csv file, although it corrects this error but fails to import all the data.. I am unsure how to resolve tis issue.

 

Cheers

Anum

13 REPLIES 13
pau13rown
Lapis Lazuli | Level 10

what data are missing when you use csv? proc import generates code that you can then modify to ensure all variables are imported

Amalik
Calcite | Level 5

Well, I cannot identify as its a huge file but SAS log says import unsuccessful , also I can see it says "invalid data for var5560 in line 278 32767-32767"

Amalik
Calcite | Level 5

 I can see SAS log telling me "invalid data for var6677 in line 326 32767-32767" for various variables 

ChrisNZ
Tourmaline | Level 20

This means that your CSV records are longer than 32767 bytes.

How do you import the CSV data?

You must lengthen LRECL.

Amalik
Calcite | Level 5

Hi,

 

I have tried doing that but it isn't solving my problem. I can see that even though my original file contains data upto 4 digits but when I am importing it in SAS it is converting some values to large decimal places like belowimage.png

 

I guess this is creating problem. Any idea how can I solve this issue?

ChrisNZ
Tourmaline | Level 20

Numerical precision is a whole different issue. 

Fix the record length issue first.

Answer the question and show the code used and the log.

 

Amalik
Calcite | Level 5

I have been using the following code for fixing record length issue,

filename AR 'C:\Users\amalik\Desktop\dailydata\AR\DATEAR.XLSX' lrecl=27000;
proc import datafile=AR dbms=XLSX out=AR replace;
   run;

But it is still importing all variables as characters

ChrisNZ
Tourmaline | Level 20
Amalik
Calcite | Level 5

Sorry, I did replace it with a value greater than 32767

ballardw
Super User

@Amalik wrote:

 I can see SAS log telling me "invalid data for var6677 in line 326 32767-32767" for various variables 


Gads, you have at least 6677 variables, which is very questionable in the first place. The specific message is saying that AFTER proc import determined what type of data var6677 should be (after examining the first few rows of data) that the value encountered on line 326 did not match. Most likely is SAS guessed that the variable is numeric and encountered something in the cell that does not match the layout of previous cells such as special characters like ()*$, or previous values were treated as dates and the value in the cell was not recognizable as a valid date format.

 

If you are using Proc Import with csv add the option Guessingrows=max. This option will have SAS examine more rows before assigning variable types, informats and formats.

 

Proc import for csv will generate the data step code used to read the file. Look in the log above all of the invalid data messages. There you can see the informats assigned for reading the values. Though with over 6000 variables there is going to be a lot of lines of code since the procedure generated one informat and format statement for each variable.

 

If indeed ALL of your variables are numeric something similar to this should work:

data want;
   infile "path and filename goes here.csv" dlm=',' lrecl=50000 truncover;
   informat var1-var7000 best32.; /*replace 7000 with the expected number of columns*/
   input var1-var7000;
run;

though you may still have to increase the lrecl.

 

 

Note there is also a possible issue with CSV created from Excel files of "phantom" columns. In csv you sometimes get rows that end with ,,,,,,,, (the number of , varies) representing something in the history of the Excel file that no longer contains data due to deletion but Excel considers those columns "used" and will export them to csv.

Kurt_Bremser
Super User

DO NOT use proc import. It's guessing will only cause trouble.

Inspect your csv file with a solid text editor like notepad++.

It will show you the file type (DOS, UNIX or Mac), which then helps you in using the correct TERMSTR in the infile statement, which most probably will alleviate your lrecl problem.

 

If you still have problems, post some lines of your csv as an attachment here.

 

Amalik
Calcite | Level 5

I was finally able to find out the issue in the file. Actually the excel and csv version of files both contain * in them.

 The number is shown as a number say 1.12 but when I use ctrl+F and find *, it finds the numbered cell. When I replace * with a blank, it replaces the entire number say 1.12 will replace with a blank.

 

Iam unsure on how to handle this weird issue. Any suggestions please?

ChrisNZ
Tourmaline | Level 20

Do not use Excel.

What value do you see when you open the CSV file with a text editor? 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1557 views
  • 4 likes
  • 5 in conversation