Hi,
I have a difficulty to import the excel file (csv) to window SAS 9.4 version. I tried to find issues in the excel sheet based on sas log (attached file), but I could not find any problem in the data format for the following variables:
firmid, engagementpartnerid, issuercik, latestformapfiling, and formfilingid.
Any advice will be highly appreciated.
CSV is not Excel, it's a text file. Open your file in a text editor and examine the line indicated in your log - ignoring the header line.
Check if the type, numeric/character, matches the type expected for the variables indicated in the error messages in the log.
Or re-run proc import and set GUESSINGROWS=MAX, which will take a long time, but likely a better guess at importing your data to avoid these issues in the first place.
@joon1 wrote:
Hi,
I have a difficulty to import the excel file (csv) to window SAS 9.4 version. I tried to find issues in the excel sheet based on sas log (attached file), but I could not find any problem in the data format for the following variables:
firmid, engagementpartnerid, issuercik, latestformapfiling, and formfilingid.
Any advice will be highly appreciated.
Thank you for your quick reply
It is a file with comma separated values. I converted excel files to csv files and import csv files to SAS window because it worked well. The plan excel files are not easily imported to window SAS. I checked whether character variable is included, for instance, the variable firmid(first column) and did not find any problem. Any advice to resolve this import problem will be highly appreciated.
Are you absolutely sure?
Post a screenshot of the line in Excel for the record 589 or 590 if you're including headers.
Specifically the FormID and the last variables.
And if the data was confidential you've already released it within the log attached earlier.
Is it possible your data has line feeds/returns in it? And that's pushing part of it to the next line in the CSV file?
This is what SAS is seeing as the line for record 590 and why it's generating errors.
",Partner,,,7/10/2017 13:34 27
Check the CSV file in a text editor NOT excel to verify this.
@joon1 wrote:
Thank you for your quick reply
It is a file with comma separated values. I converted excel files to csv files and import csv files to SAS window because it worked well. The plan excel files are not easily imported to window SAS. I checked whether character variable is included, for instance, the variable firmid(first column) and did not find any problem. Any advice to resolve this import problem will be highly appreciated.
If looks like someone put line breaks in one or more cells in your original Excel file.
How many records were in your original Excel file? I suspect that it is fewer than SAS is seeing in the CSV file.
NOTE: 14558 records were read from the infile 'C:\Users\hakjoon\Desktop\FirmFilings01122018.csv'. The minimum record length was 0. The maximum record length was 10880.
Also note the Excel would not write any blank lines and your CSV file has at least one line with no characters on it at all. A blank row in the Excel file would still result in a line with delimiters on it in the CSV file.
The easiest way to fix this is to remove them from the Excel file and then re-save it as a CSV file. If you just have a few of them then remove them by hand. Otherwise try to use the search-replace feature to remove them.
Thank you so much for your suggestion
I selected several necessary variables and created a new excel file (csv file) after deleting empty lines. Similar error message was produced below. I don't know why the csv file (based on attached excel file) was not imported. Any advice will be highly appreciated.
NOTE: Invalid data for IssuerCIK in line 638 1-1.
NOTE: Invalid data for AuditReportDate in line 638 3-7.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
638 ","Lori 8
IssuerCIK=. AuditReportDate=. FiscalPeriodEndDate=. EngagementPartnerLast_Name=
EngagementPartnerFirst_Name= EngagementPartnerID=. FilingDate=. _ERROR_=1 _N_=637
NOTE: Invalid data for IssuerCIK in line 639 1-1.
NOTE: Invalid data for AuditReportDate in line 639 3-10.
639 ",55200004,03/17/17 19
NOTE: 14195 records were read from the infile 'C:\Users\hakjoon\Desktop\firmfilingshort.csv'.
The minimum record length was 8.
The maximum record length was 80.
NOTE: The data set WORK.S1 has 14195 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
Errors detected in submitted DATA step. Examine log.
14195 rows created in WORK.S1 from C:\Users\hakjoon\Desktop\firmfilingshort.csv.
ERROR: Import unsuccessful. See SAS Log for details.
Check the line above it in the Excel file (637) and the variable right before IssuerCIK - it likely has a field with the ALT+RETURN entered as well.
Thank you so much for your reply
The following is the data in line 637. It seems that error log is related to this dataline. As you suggested, I deleted one column before cik code, but similar error message is produced. I really don't know what is happening here. Could you please open the attached excel file and suggest an idea how to fix this? Thanks again
19780 | 02/24/17 | 12/31/16 | Ehleben | Lori | 55200004 | 03/17/17 |
NOTE: Invalid data for IssuerCIK in line 638 1-1.
NOTE: Invalid data for AuditReportDate in line 638 3-7.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
638 ","Lori 8
IssuerCIK=. AuditReportDate=. FiscalPeriodEndDate=. EngagementPartnerLast_Name=
EngagementPartnerFirst_Name= EngagementPartnerID=. FilingDate=. _ERROR_=1 _N_=637
NOTE: Invalid data for IssuerCIK in line 639 1-1.
NOTE: Invalid data for AuditReportDate in line 639 3-10.
639 ",55200004,03/17/17 19
IssuerCIK=. AuditReportDate=. FiscalPeriodEndDate=03/17/2017 EngagementPartnerLast_Name=
EngagementPartnerFirst_Name= EngagementPartnerID=. FilingDate=. _ERROR_=1 _N_=638
NOTE: Invalid data for IssuerCIK in line 2025 1-1.
NOTE: Invalid data for AuditReportDate in line 2025 3-7.
You still have linefeeds in some of the data fields in your EXCEL file.
You can just use PROC IMPORT to read the original file.
proc import datafile="&fname" dbms=xls
out=test replace
;
run;
Or fix the cells that still have embedded linefeeds.
010 data _null_; 5011 set test ; 5012 array c _character_ ; 5013 do i=1 to dim(c); 5014 badc = indexc(c(i),'090A0D00A0'x) ; 5015 if badc then do; 5016 put _n_= badc= c(i)= $hex80. ; 5017 end; 5018 end; 5019 run; _N_=636 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020 _N_=636 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020 _N_=2021 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020 _N_=2021 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020 _N_=5502 badc=6 EngagementPartnerLast_Name=5473616E670A20202020202020202020202020202020202020202020 _N_=5502 badc=8 EngagementPartnerFirst_Name=4B6169205461690A2020202020202020202020 _N_=7486 badc=7 EngagementPartnerLast_Name=4B61697365720A202020202020202020202020202020202020202020 _N_=7486 badc=7 EngagementPartnerFirst_Name=4A616D6573200A202020202020202020202020 _N_=7906 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020 _N_=7906 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020 _N_=7907 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020 _N_=7907 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020 _N_=8256 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020 _N_=8256 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020 _N_=8257 badc=8 EngagementPartnerLast_Name=45686C6562656E0A2020202020202020202020202020202020202020 _N_=8257 badc=6 EngagementPartnerFirst_Name=4C6F7269200A20202020202020202020202020 NOTE: There were 14179 observations read from the data set WORK.TEST.
If you save the file to a CSV on a Windows machine that used CRLF as the end of line marker.
Then you should also be able to read the CSV file if you just make sure to use the TERMSTR=CRLF on the INFILE statement.
5072 data test2; 5073 infile "c:\downloads\firmfilingshort.csv" dsd truncover termstr=CRLF ; 5074 length a b c d e f g $100 ; 5075 input a--g ; 5076 run; NOTE: The infile "c:\downloads\firmfilingshort.csv" is: Filename=c:\downloads\firmfilingshort.csv, RECFM=V,LRECL=32767,File Size (bytes)=832195, Last Modified=24Jan2018:19:35:05, Create Time=24Jan2018:19:34:57 NOTE: 14180 records were read from the infile "c:\downloads\firmfilingshort.csv". The minimum record length was 46. The maximum record length was 131. NOTE: The data set WORK.TEST2 has 14180 observations and 7 variables.
And still see the embedded linefeeds.
5077 data _null_; 5078 set test2; 5079 array _c _character_ ; 5080 do i=1 to dim(_c); 5081 badc = indexc(_c(i),'090A0D00A0'x) ; 5082 if badc then do; 5083 put _n_= badc= _c(i)= $hex80. ; 5084 end; 5085 end; 5086 run; _N_=637 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020 _N_=637 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020 _N_=2022 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020 _N_=2022 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020 _N_=5503 badc=6 d=5473616E670A20202020202020202020202020202020202020202020202020202020202020202020 _N_=5503 badc=8 e=4B6169205461690A2020202020202020202020202020202020202020202020202020202020202020 _N_=7487 badc=7 d=4B61697365720A202020202020202020202020202020202020202020202020202020202020202020 _N_=7487 badc=7 e=4A616D6573200A202020202020202020202020202020202020202020202020202020202020202020 _N_=7907 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020 _N_=7907 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020 _N_=7908 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020 _N_=7908 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020 _N_=8257 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020 _N_=8257 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020 _N_=8258 badc=8 d=45686C6562656E0A2020202020202020202020202020202020202020202020202020202020202020 _N_=8258 badc=6 e=4C6F7269200A20202020202020202020202020202020202020202020202020202020202020202020 NOTE: There were 14180 observations read from the data set WORK.TEST2.
Which are then easy to remove.
data want ;
set have ;
array _C _CHARACTER_;
do over _c ;
_c = compress(_c,'0A'x);
end;
run;
Thank you so much for your kind reply
When I tried the following code, I cannot read the csv file in my folder. The fundamental problem is that I cannot import the csv file to SAS window. What do you mean by 'embedded linefeeder'? Is any way to detect and delete in the excel file? Your help will be very much appreciated.
1102 data test1;
1103 infile "C:\Users\hakjoon\Documents\audit parter attributes\data" dsd truncover termstr=CRLF
1103! ;
1104 length a b c d e f g $100 ;
1105 input a--g ;
1106 run;
ERROR: Invalid file, C:\Users\hakjoon\Documents\audit parter attributes\data.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0
observations and 7 variables.
WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
This is what an embedded line feed is:
The error message in your code means that SAS could not find the file using the path you gave it. Check the actual name of your CSV file and use the full name in the INFILE statement. Normally the name of a CSV file ends with '.csv'. Perhaps you just need to include that in your code? Perhaps your explorer windows on your PC is suppressing the file extension and so you don't realize they are there?
A linefeed is the hex code '0A'x. It is clear from your earlier SAS log that line 638 in your CSV has a linefeed character in position 9 that is making the line look like two lines to SAS.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- 638 ","Lori 8 639 ",55200004,03/17/17 19
You can find many pages on the web for ways to deal with them.
This page for example shows how to create, find or replace them when using Excel.
http://professor-excel.com/line-breaks-in-excel/
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.