BookmarkSubscribeRSS Feed
joon1
Quartz | Level 8

Hi,

 

I have a difficulty to import the excel file (csv) to window SAS 9.4 version. I tried to find issues in the excel sheet based on sas log (attached file), but I could not find any problem in the data format for the following variables:

firmid, engagementpartnerid, issuercik, latestformapfiling, and formfilingid.

Any advice will be highly appreciated.

11 REPLIES 11
Reeza
Super User

CSV is not Excel, it's a text file. Open your file in a text editor and examine the line indicated in your log - ignoring the header line. 

Check if the type, numeric/character, matches the type expected for the variables indicated in the error messages in the log. 

 

Or re-run proc import and set GUESSINGROWS=MAX, which will take a long time, but likely a better guess at importing your data to avoid these issues in the first place. 

 


@joon1 wrote:

Hi,

 

I have a difficulty to import the excel file (csv) to window SAS 9.4 version. I tried to find issues in the excel sheet based on sas log (attached file), but I could not find any problem in the data format for the following variables:

firmid, engagementpartnerid, issuercik, latestformapfiling, and formfilingid.

Any advice will be highly appreciated.


 

joon1
Quartz | Level 8

Thank you for your quick reply

It is a file with comma separated values. I converted excel files to csv files and import csv files to SAS window because it worked well. The plan excel files are not easily imported to window SAS. I checked whether character variable is included, for instance, the variable firmid(first column) and did not find any problem. Any advice to resolve this import problem will be highly appreciated. 

Reeza
Super User

Are you absolutely sure? 

Post a screenshot of the line in Excel for the record 589 or 590 if you're including headers. 

Specifically the FormID and the last variables.

 

And if the data was confidential you've already released it within the log attached earlier. 

 

Is it possible your data has line feeds/returns in it? And that's pushing part of it to the next line in the CSV file?

 

This is what SAS is seeing as the line for record 590 and why it's generating errors. 

 

",Partner,,,7/10/2017 13:34 27

 

Check the CSV file in a text editor NOT excel to verify this. 


@joon1 wrote:

Thank you for your quick reply

It is a file with comma separated values. I converted excel files to csv files and import csv files to SAS window because it worked well. The plan excel files are not easily imported to window SAS. I checked whether character variable is included, for instance, the variable firmid(first column) and did not find any problem. Any advice to resolve this import problem will be highly appreciated. 


 

Tom
Super User Tom
Super User

If looks like someone put line breaks in one or more cells in your original Excel file.

How many records were in your original Excel file? I suspect that it is fewer than SAS is seeing in the CSV file.

NOTE: 14558 records were read from the infile 'C:\Users\hakjoon\Desktop\FirmFilings01122018.csv'.
The minimum record length was 0.
The maximum record length was 10880.

Also note the Excel would not write any blank lines and your CSV file has at least one line with no characters on it at all.  A blank row in the Excel file would still result in a line with delimiters on it in the CSV file.

 

The easiest way to fix this is to remove them from the Excel file and then re-save it as a CSV file.  If you just have a few of them then remove them by hand.  Otherwise try to use the search-replace feature to remove them.

 

 

 

joon1
Quartz | Level 8

Thank you so much for your suggestion

I selected several necessary variables and created  a new excel file (csv file) after deleting empty lines. Similar error message was produced below. I don't know why the csv file (based on attached excel file) was not imported. Any advice will be highly appreciated.

 

NOTE: Invalid data for IssuerCIK in line 638 1-1.

NOTE: Invalid data for AuditReportDate in line 638 3-7.

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---

638       ","Lori  8

IssuerCIK=. AuditReportDate=. FiscalPeriodEndDate=. EngagementPartnerLast_Name=

EngagementPartnerFirst_Name= EngagementPartnerID=. FilingDate=. _ERROR_=1 _N_=637

NOTE: Invalid data for IssuerCIK in line 639 1-1.

NOTE: Invalid data for AuditReportDate in line 639 3-10.

639       ",55200004,03/17/17 19

 

NOTE: 14195 records were read from the infile 'C:\Users\hakjoon\Desktop\firmfilingshort.csv'.

      The minimum record length was 8.

      The maximum record length was 80.

NOTE: The data set WORK.S1 has 14195 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           0.09 seconds

      cpu time            0.09 seconds

 

 

Errors detected in submitted DATA step. Examine log.

14195 rows created in WORK.S1 from C:\Users\hakjoon\Desktop\firmfilingshort.csv.

 

 

 

ERROR: Import unsuccessful. See SAS Log for details.

Reeza
Super User

Check the line above it in the Excel file (637) and the variable right before IssuerCIK - it likely has a field with the ALT+RETURN entered as well.

joon1
Quartz | Level 8

Thank you so much for your reply

The following is the data in line 637. It seems that error log is related to this dataline. As you suggested, I deleted one column before cik code, but similar error message is produced. I really don't know what is happening here. Could you please open the attached excel file and suggest an idea how to fix this? Thanks again

 

1978002/24/1712/31/16EhlebenLori5520000403/17/17

 

NOTE: Invalid data for IssuerCIK in line 638 1-1.

NOTE: Invalid data for AuditReportDate in line 638 3-7.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---

638 ","Lori 8

IssuerCIK=. AuditReportDate=. FiscalPeriodEndDate=. EngagementPartnerLast_Name=

EngagementPartnerFirst_Name= EngagementPartnerID=. FilingDate=. _ERROR_=1 _N_=637

NOTE: Invalid data for IssuerCIK in line 639 1-1.

NOTE: Invalid data for AuditReportDate in line 639 3-10.

639 ",55200004,03/17/17 19

IssuerCIK=. AuditReportDate=. FiscalPeriodEndDate=03/17/2017 EngagementPartnerLast_Name=

EngagementPartnerFirst_Name= EngagementPartnerID=. FilingDate=. _ERROR_=1 _N_=638

NOTE: Invalid data for IssuerCIK in line 2025 1-1.

NOTE: Invalid data for AuditReportDate in line 2025 3-7.

Tom
Super User Tom
Super User

You still have linefeeds in some of the data fields in your EXCEL file.

 

You can just use PROC IMPORT to read the original file. 

proc import datafile="&fname" dbms=xls
  out=test replace 
;
run;

Or fix the cells that still have embedded linefeeds.

010  data _null_;
5011    set test ;
5012    array c _character_ ;
5013    do i=1 to dim(c);
5014      badc = indexc(c(i),'090A0D00A0'x) ;
5015      if badc then do;
5016         put _n_= badc= c(i)= $hex80. ;
5017      end;
5018    end;
5019  run;

_N_=636  badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020
_N_=636  badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020
_N_=2021 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020
_N_=2021 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020
_N_=5502 badc=6 EngagementPartnerLast_Name=5473616E670A20202020202020202020202020202020202020202020
_N_=5502 badc=8 EngagementPartnerFirst_Name=4B6169205461690A2020202020202020202020
_N_=7486 badc=7 EngagementPartnerLast_Name=4B61697365720A202020202020202020202020202020202020202020
_N_=7486 badc=7 EngagementPartnerFirst_Name=4A616D6573200A202020202020202020202020
_N_=7906 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020
_N_=7906 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020
_N_=7907 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020
_N_=7907 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020
_N_=8256 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020
_N_=8256 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020
_N_=8257 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020
_N_=8257 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020
NOTE: There were 14179 observations read from the data set WORK.TEST.

If you save the file to a CSV on a Windows machine that used CRLF as the end of line marker.

image.png

Then you should also be able to read the CSV file if you just make sure to use the TERMSTR=CRLF on the INFILE statement.

 

5072  data test2;
5073    infile "c:\downloads\firmfilingshort.csv" dsd truncover termstr=CRLF ;
5074    length a b c d e f g $100 ;
5075    input a--g ;
5076  run;

NOTE: The infile "c:\downloads\firmfilingshort.csv" is:
      Filename=c:\downloads\firmfilingshort.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=832195,
      Last Modified=24Jan2018:19:35:05,
      Create Time=24Jan2018:19:34:57

NOTE: 14180 records were read from the infile "c:\downloads\firmfilingshort.csv".
      The minimum record length was 46.
      The maximum record length was 131.
NOTE: The data set WORK.TEST2 has 14180 observations and 7 variables.

And still see the embedded linefeeds.

5077  data _null_;
5078    set test2;
5079    array _c _character_ ;
5080    do i=1 to dim(_c);
5081      badc = indexc(_c(i),'090A0D00A0'x) ;
5082      if badc then do;
5083         put _n_= badc= _c(i)= $hex80. ;
5084      end;
5085    end;
5086  run;

_N_=637  badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020
_N_=637  badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020
_N_=2022 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020
_N_=2022 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020
_N_=5503 badc=6 d=5473616E670A20202020202020202020202020202020202020202020202020202020202020202020
_N_=5503 badc=8 e=4B6169205461690A2020202020202020202020202020202020202020202020202020202020202020
_N_=7487 badc=7 d=4B61697365720A202020202020202020202020202020202020202020202020202020202020202020
_N_=7487 badc=7 e=4A616D6573200A202020202020202020202020202020202020202020202020202020202020202020
_N_=7907 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020
_N_=7907 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020
_N_=7908 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020
_N_=7908 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020
_N_=8257 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020
_N_=8257 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020
_N_=8258 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020
_N_=8258 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020
NOTE: There were 14180 observations read from the data set WORK.TEST2.

Which are then easy to remove.

data want ;
  set have ;
  array _C _CHARACTER_;
  do over _c ;
    _c = compress(_c,'0A'x);
  end;
run;
joon1
Quartz | Level 8

Thank you so much for your kind reply

When I tried the following code, I cannot read the csv file in my folder. The fundamental problem is that I cannot import the csv file to SAS window. What do you mean by 'embedded linefeeder'? Is any way to detect and delete in the excel file? Your help will be very much appreciated.

 

1102 data test1;

1103 infile "C:\Users\hakjoon\Documents\audit parter attributes\data" dsd truncover termstr=CRLF

1103! ;

1104 length a b c d e f g $100 ;

1105 input a--g ;

1106 run;

ERROR: Invalid file, C:\Users\hakjoon\Documents\audit parter attributes\data.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0

observations and 7 variables.

WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

Reeza
Super User

This is what an embedded line feed is:

 

delete_alt_new_line.JPG

Tom
Super User Tom
Super User

The error message in your code means that SAS could not find the file using the path you gave it.  Check the actual name of your CSV file and use the full name in the INFILE statement.  Normally the name of a CSV file ends with '.csv'. Perhaps you just need to include that in your code?  Perhaps your explorer windows on your PC is suppressing the file extension and so you don't realize they are there?

 

A linefeed is the hex code '0A'x.  It is clear from your earlier SAS log that line 638 in your CSV has a linefeed character in position 9 that is making the line look like two lines to SAS.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
638   ","Lori 8
639   ",55200004,03/17/17 19

You can find many pages on the web for ways to deal with them.

 

This page for example shows how to create, find or replace them when using Excel.

http://professor-excel.com/line-breaks-in-excel/

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2357 views
  • 0 likes
  • 3 in conversation