05-12-2014 07:29 PM
I have a comma delimited file .csv created out of lotus notes db, I have renamed the file to .txt and I have FTPed the file to Mainframes. I am unable to read the file on Mainframes and receiving error invalid data on all the input lines.
Sample of the Input as seen on the Mainframe:
JReF No.,TRUS No.,TRUS Name,Trustees Bank,Stamp Code,Penaltys Number,Type,PEN Amount ,Trust Amount ,Funds Held ,Total Penaltys
57735,15286,ABCDE ABCDEDEFGHI COMPANY (UN),515,,35645311,L,$0.11 ,$0.00,$0.00,$0.11
52715,55325,ABABCDEDFGHKI ADLADJFD SLK ,515,,31954258,L,$0.16 ,$0.00,$0.00,$0.16
52715,55325,ABABCDEDFGHKI ADLADJFD SLK ,515,,31954258,L,$0.17 ,$0.00,$0.00,$0.17
11955,22956,LIONSDENIT INC. PP. LTD.,515,,35644876,L,$0.17 ,$0.00,$0.00,$0.17
INFILE UPLOAD DLM=',' DSD RECFM=VB LRECL=256; /* Please note, I also tried FTPing the file into a fixed 200 lrecl file and tried the same code */
INPUT FLD1-FLD10 :$20.;
PUT 'FLD1= ' FLD1;
PUT 'FLD2= ' FLD2;
PUT 'FLD3= ' FLD3;
NOTE: Invalid data for FLD1 in line 1 1-8.
NOTE: Invalid data for FLD2 in line 1 10-17.
NOTE: Invalid data for FLD3 in line 1 19-27.
2 The SAS System
NOTE: Invalid data for FLD4 in line 1 29-41.
NOTE: Invalid data for FLD5 in line 1 43-52.
NOTE: Invalid data for FLD6 in line 1 54-69.
NOTE: Invalid data for FLD7 in line 1 71-74.
NOTE: Invalid data for FLD8 in line 1 76-85.
NOTE: Invalid data for FLD9 in line 1 88-99.
1 JReF No.,TRUS No.,TRUS Name,Trustees Bank,Stamp Code,Penaltys Number,Type,PEN Amount ,Trust Amount
101 ,Funds Held ,Total Security 128
FLD1=. FLD2=. FLD3=. FLD4=. FLD5=. FLD6=. FLD7=. FLD8=. FLD9=. FLD10=Funds Held
NOTE: Invalid data for FLD3 in line 2 13-42.
NOTE: Invalid data for FLD7 in line 2 58-58.
NOTE: Invalid data for FLD8 in line 2 60-64.
NOTE: Invalid data for FLD9 in line 2 67-71.
2 57735,15286,ABCDE ABCDEDEFGHI COMPANY (UN),515,,35645311,L,$0.11 ,$0.0
FLD1=57735 FLD2=15286 FLD3=. FLD4=515 FLD5=. FLD6=35645311 FLD7=. FLD8=. FLD9=. FL
Can anyone please help where I am doing the mistake? I am using SAS 9 on Mainframes (z/os) .
05-13-2014 04:45 AM
There is nothing special for a mainframe within this.
You uploaded the file successfully and all is still readable. the Ascii/Ebcdic byte translation is workable.
Using a VB (lrecl 32Kb bklsize=27998) is the best one for the record crlf convention translation, a lrecl=255 (common convention in Unix) is also good as long your records are not longer.
No zoned/number lines with not readable characters, you have mistakes in some generic issue. You could doe this also on your pc.
The file is coming in with te first record obviously being column names, You better drop this record and start with the second and after.
use the firstobs=2 to do that SAS(R) 9.3 Statements: Reference
You are reading 10 char variables Fld1-fld10. The default length is 8. This is what the column pointering is telling you. Every time increasing 8 positions, not using de comma as seperator.
Better define variables with formats (attribute) and length you are needing. 2 numeric, 1 character (how long?) hopefully without special chars and without "," , 3? numeric, 1 character (length 1?), 3 numeric with dollar format. You are lucky with the decimal-point. Hopefully no "," used with big ammounts. The "," is not a certain unique symbol. When conflicting you can move to the "tab" or ~ approach.
You are inputting list format. But is functioning as a formatted one. The list input is not that easy with specifying formats. The length of a format and the var is not working by conflicting logic.
SAS(R) 9.3 Statements: Reference (input list). The adjusted approach with fully specifying the variables in advance should solve this.
05-16-2014 01:59 AM
Actually, the way he codes his example, the vars are defined as numeric, because of the :$20.
So he gets errors everytime there is non-numeric data.
The data step should look like:
infile '$HOME/intest.txt' dlm=',' dsd;
format fld1-fld20 $20.;
I tested this on my AIX server, so the behaviour is platform-indepenent.
05-13-2014 09:23 AM
@ksharp, Why truncover?
It did read 9 numeric (8 long) and the 10-th as string (8 long). All according the coding, but as pitty not what was wanted.
The postion/record lengt was 128 short enough to be under the 255.
05-15-2014 02:31 PM
Thank You Jaap and Ksharp for the suggestions!! After much struggling I realize that ',' delim will not just work because of the data challenges. For some records with character, I have "," in between the character string and that data is represented as "AAA, BBB CCC", after exporting it out of the lotus notes db. For other records do not have "," in them. And big amounts are also formatted in commas, so it is making difficult to read and format the data. I had to workaround with importing the data to MS Access and format it properly, convert to text file and upload to Mainframes as a fixed length, then I could read it.
Your suggestions would have really worked if the data were consistent, this is a limitation in SAS I guess. Character with spaces are also not being read properly, they are being set as missing values. If the default length is 8, atleast it should print the first 8 characters of the data ? And yeah, I do not intend to drop the header record during initial read, because I want to read every field as a character variable during initial read.
05-15-2014 08:11 PM
You say "this is a limitation in SAS I guess" but it is really a limitation in the CSV conventions - because there is no standard and thus different implementations publish and resolve embedded characters differently. It would be quite possible, though tedious, to write SAS code to import the kind of data you have, but if you are happy to add the extra Access steps in your process that is a good enough solution (for now!).
05-15-2014 02:44 PM
1/ using a delimiter like ~ (when unique) instead of comma will sole a lot.
2/ Embedded spaces and other characters can be read.
SAS(R) 9.3 Formats and Informats: Reference (informats $quote )
3/ reading all data as character is possible you must define the SAS variables and informats accordingly
Reading them as numeric will result in showing missings. That is what most likely has happened.
4/ You cannot change numeric/character type after having defined them. SAS is using a tabular approach like a RDBMS.
That limitation is quite common with that.
05-16-2014 02:01 AM
Your problem does not stem from the fact you are working on a mainframe.
Your code makes SAS define the variables as numeric.
What you probably want is
format fld1-fld20 $20.; * this causes all vars to be character;
input fld1-fld20; * format not necessary;