Hi,
I got some problem when I am trying to import the .d data. The data contains a very long characters. Although it is covered by open and close quotation mark, SAS is not able to recognise it.
For example: Content of the data (test.d)
"Row1" "A AAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBC
CCCCCCCCCCCCCCCC"
"Row2" "A AAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBC
CCCCCCCCCCCCCCCC"
My SAS code :-
data test;
%let _EFIERR_ = 0;
infile "&ImportLocation.test.d" delimiter=' ' MISSOVER DSD lrecl=32767;
informat
Var1 $ 4.
Var2 $ 999.;
format
Var1 $ 4.
Var2 $ 999.;
input
Var1 $
Var2 $ ;
run;
Output :-
Var1 Var2
Row1 "A
BBBB
CCCC
Row2 "A
BBBB
CCCC
It is not able to read the full content inside Variables2, the open and close quotation mark is not working.
How can I adjust my SAS code to read the data in variables2?
Thz
When reading from text files, the line separator takes precedence over the quotes that denote columns. I solved the problem by writing a small C filter program that replaces linefeeds with a <BR> tag everywhere the count of quotes is uneven.
But I think there is a solution with using recfm=n and reading the instream column by column.
This is the source of the code:
#include <stdio.h> #include <unistd.h> main (argc,argv) int argc; char *argv[]; { FILE *ifp; FILE *ofp; int c; int flag; int iflag = 0; int oflag = 0; char *infile = NULL; char *outfile = NULL; opterr = 0; while ((c = getopt(argc,argv,"i:o:")) != -1) switch (c) { case 'i': iflag = 1; infile = optarg; break; case 'o': oflag = 1; outfile = optarg; break; case '?': if (optopt == 'c') fprintf (stderr, "Option -%c requires an argumen t.\n", optopt); else if (isprint (optopt)) fprintf (stderr, "Unknown option `-%c'.\n", opto pt); else fprintf (stderr, "Unknown option character `\\x%x'.\n", optopt); return 1; } if (iflag == 1) { ifp = fopen(infile,"r"); if (ifp == NULL) { fprintf(stderr, "could not open infile %s\n",infile); return 2; } } else { ifp = stdin; } if (oflag == 1) { ofp = fopen(outfile,"w"); if (ofp == NULL) { fprintf(stderr, "could not open outfile %s\n",outfile); return 3; } else { fprintf(stdout,"converting %s to %s\n",infile,outfile); } } else { ofp = stdout; } flag = 0; while ((c = fgetc(ifp)) != EOF ) { if (c == 39) { flag = abs(flag - 1); } if ((c == 10) && (flag == 1)) { fputc('<',ofp); fputc('B',ofp); fputc('R',ofp); fputc('>',ofp); } else { fputc(c,ofp); } } }
It uses the standard C function getopt() to retrieve the commandline arguments (infile and outfile names); if no name is supplied, stdin and/or stdout are used, respectively.
The main program starts at flag = 0; and just loops until all bytes are read.
You need to have gcc and the basic C libraries installed to compile the code. You could also port the logic to any other programming lagnuage of your choice. The nice thing about C is the sheer speed of the executable.
Just to add, its a good idea to name the file extension to directly reference what the file contains (e.g. Excel files are XLSX, Word is DOCX, SAS programs are SAS etc.). .D is a file association with the D programming language, so naming your file this way can cause unusual effects. If this is a text based delimited file, then .txt or .dlm or .csv (for comma delimited) should be used.
For a SAS based solution (note, updated this slightly from the SAS support provided code from post:
Not tested it, but should work fine.
data _null_; infile "s:\temp\rob\x.csv" recfm=n; file "s:\temp\rob\y.csv" recfm=n; retain flag 0; input a $char1.; if a = '"' then flag=ifn(flag=0,1,0); if not(flag=1 and a in ('0D'x,'0A'x)) then put a $char1.; run;
This reads in the file bit by bit, and drops carriage return characters from the the file.
Unfortunately the file that I download from the Progress system is a .d file
I cannot change it to .csv or .txt >..<
After downloading, you can rename the file on your computer to any extension you want, including .this_is_the_silliest_extension_I_could_come_up_with.
@hinear wrote:
Unfortunately the file that I download from the Progress system is a .d file
I cannot change it to .csv or .txt >..<
data test(keep= Var1 Var2);
length
Var1 $ 4 Var2 $ 999
buffer $ 32000
rx 8
;
retain
buffer ' '
rx
;
if _n_ = 1 then do;
rx = prxparse('/"(.+)".*"(.+)"/');
end;
infile "&ImportLocation.test.d" delimiter=' ' MISSOVER DSD lrecl=32767;
input;
if lengthn(_infile_) = 0 then delete;
if countc(_infile_, '"') = 3 then do;
buffer = _infile_;
end;
else do;
buffer = cats(buffer, _infile_);
end;
if mod(countc(buffer, '"'), 2) = 0 then do;
if prxmatch(rx, buffer) then do;
Var1 = prxposn(rx, 1, buffer);
Var2 = prxposn(rx, 2, buffer);
output;
end;
end;
run;
just another idea ....
Thank you very much, it is working. However, my data also contain some variables(number and date), which does not have open and close quotation mark.
For example: Content of the data (test.d)
"Row1" 123 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBC
CCCCCCCCCCCCCCCC"
"Row2" 234 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBC
CCCCCCCCCCCCCCCC"
How to apply the same logic on it?
You should look at the file and see what it is actually using as the delimiter and as end of line markers. You can easily use SAS to do this if you do not have a text editor that can do it for you. For example to list the 1,000 bytes of your file you could use code like this.
data _null_;
infile 'myfile.d' lrecl=100 recfm=f obs=10;
input;
list;
run;
If you are lucky your file will have <CR><LF> at the end of a record and just <CR> or just <LF> as the splits in the middle of the long lines. For example like this:
1213 data _null_; 1214 infile test lrecl=50 recfm=f obs=10; 1215 input; 1216 list; 1217 run; NOTE: The infile TEST is: Filename=... RECFM=F,LRECL=50,File Size (bytes)=166, Last Modified=10Jul2017:10:17:30, Create Time=10Jul2017:10:17:30 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9-- 1 "Row1" 123 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA 2 CHAR .BBBBBBBBBBBC.CCCCCCCCCCCCCCCC".."Row2" 234 24/09/ ZONE 04444444444440444444444444444420025673223332332332 NUMR D222222222223D33333333333333332DA22F7220234024F09F 3 CHAR 2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA.BBBBBBBBBBBC.CCC ZONE 33332242444444444444444444444444404444444444440444 NUMR 201902101111111111111111111111111D222222222223D333 4 CHAR CCCCCCCCCCCCC".. 16 ZONE 4444444444444200 NUMR 33333333333332DA NOTE: 4 records were read from the infile TEST.
So you could read it like this.
data want ;
infile test dsd dlm=' ' truncover termstr=crlf ;
length var1 $10 var2 var3 8 var4 $1000 ;
informat var3 ddmmyy.;
format var3 yymmdd10.;
input var1-var4 ;
var4 = translate(var4,'||','0D0A'x);
run;
Unfortunately, my data has <CR><LF> in the end of each row.
"Row1" 123 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA<CR><LF>
<CR><LF>
BBBBBBBBBBBC<CR><LF>
<CR><LF>
CCCCCCCCCCCCCCCC" <CR><LF>
"Row2" 234 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA<CR><LF>
<CR><LF>
BBBBBBBBBBBC<CR><LF>
<CR><LF>
CCCCCCCCCCCCCCCC" <CR><LF>
I used your SAS code, but the output look like the following:-
var1 | var2 | var3 | var4 |
Row1 | 123 | 2019-09-24 | "A |
BBBBBBBBBB | |||
CCCCCCCCCC | |||
Row2 | 234 | 2019-09-24 | "A |
BBBBBBBBBB | |||
CCCCCCCCCC |
You will first need to pre-process the file to remove the embedded CRLF characters.
The simple quote counting technique that @RW9 posted on this thread should do the trick for this file since the strings with the embedded CRLF are quoted.
Then you can read the fixed file using normal delimited file input syntax.
I finally contact sas support and they give a similar solution.
the method is replace the carriage return and linefeed characters inside the double quotation mark by '@' and '$'
http://support.sas.com/kb/26/065.html
Thank you for all of your help. Otherwise, I don't even notice that there are <CR><LF> inside the file.
Which, you will see from my post above had already been supplied by support and posted above.
Try option recfm=n .
data test;
%let _EFIERR_ = 0;
infile "&ImportLocation.test.d" delimiter=' ' MISSOVER DSD lrecl=3276700 recfm=n ;
informat
Var1 $ 4.
Var2 $ 999.;
format
Var1 $ 4.
Var2 $ 999.;
input
Var1 $
Var2 $ @@;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.