DATA Step, Macro, Functions and more

How to read a .d file with long character across multiple line?

Reply
Occasional Contributor
Posts: 7

How to read a .d file with long character across multiple line?

Hi,

 

I got some problem when I am trying to import the .d data. The data contains a very long characters. Although it is covered by open and close quotation mark, SAS is not able to recognise it.

 

For example: Content of the data (test.d)

"Row1" "A AAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBC

CCCCCCCCCCCCCCCC"

 

"Row2"  "A AAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBC

CCCCCCCCCCCCCCCC"

 

 

 

 

My SAS code :-

 

data test;

%let _EFIERR_ = 0;

infile "&ImportLocation.test.d" delimiter=' ' MISSOVER DSD lrecl=32767;

informat

 Var1 $ 4.

 Var2 $ 999.;

format

Var1 $ 4.

Var2 $ 999.;

input

 Var1 $

Var2 $ ;

run;

 

 

 

Output  :-

Var1            Var2

Row1           "A

BBBB

CCCC

 

Row2          "A

BBBB

CCCC

 

 

It is not able to read the full content inside Variables2, the open and close quotation mark is not working.

 

How can I adjust my SAS code to read the data in variables2?

 

Thz

Super User
Posts: 7,762

Re: How to read a .d file with long character across multiple line?

When reading from text files, the line separator takes precedence over the quotes that denote columns. I solved the problem by writing a small C filter program that replaces linefeeds with a <BR> tag everywhere the count of quotes is uneven.

But I think there is a solution with using recfm=n and reading the instream column by column.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 7

Re: How to read a .d file with long character across multiple line?

Posted in reply to KurtBremser
thz for reply, can you show me how to write a small C filter program please?
Super User
Posts: 7,762

Re: How to read a .d file with long character across multiple line?

This is the source of the code:

#include <stdio.h>
#include <unistd.h>

main (argc,argv)
int argc;
char *argv[];
{
        FILE *ifp;
        FILE *ofp;
        int c;
        int flag;
        int iflag = 0;
        int oflag = 0;
        char *infile = NULL;
        char *outfile = NULL;
        opterr = 0;
        while ((c = getopt(argc,argv,"i:o:")) != -1)
                switch (c) {
                        case 'i':
                                iflag = 1;
                                infile = optarg;
                                break;
                        case 'o':
                                oflag = 1;
                                outfile = optarg;
                                break;
                        case '?':
                                if (optopt == 'c')
                                fprintf (stderr, "Option -%c requires an argumen
t.\n", optopt);
                                else if (isprint (optopt))
                                fprintf (stderr, "Unknown option `-%c'.\n", opto
pt);
                                else
                                fprintf (stderr,
                                "Unknown option character `\\x%x'.\n",
                                optopt);
                                return 1;
                }
        if (iflag == 1) {
                ifp = fopen(infile,"r");
                if (ifp == NULL) {
                        fprintf(stderr, "could not open infile %s\n",infile);
                        return 2;
                }
        }
        else {
                ifp = stdin;
        }
        if (oflag == 1) {
                ofp = fopen(outfile,"w");
                if (ofp == NULL) {
                        fprintf(stderr, "could not open outfile %s\n",outfile);
                        return 3;
                }
                else {
                        fprintf(stdout,"converting %s to %s\n",infile,outfile);
                }
        }
        else {
                ofp = stdout;
        }

        flag = 0;
        while ((c = fgetc(ifp)) != EOF ) {
                if (c == 39) {
                        flag = abs(flag - 1);
                }
                if ((c == 10) && (flag == 1)) {
                        fputc('<',ofp);
                        fputc('B',ofp);
                        fputc('R',ofp);
                        fputc('>',ofp);
                }
                else {
                        fputc(c,ofp);
                }
        }
}

It uses the standard C function getopt() to retrieve the commandline arguments (infile and outfile names); if no name is supplied, stdin and/or stdout are used, respectively.

The main program starts at flag = 0; and just loops until all bytes are read.

You need to have gcc and the basic C libraries installed to compile the code. You could also port the logic to any other programming lagnuage of your choice. The nice thing about C is the sheer speed of the executable.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,942

Re: How to read a .d file with long character across multiple line?

[ Edited ]

Just to add, its a good idea to name the file extension to directly reference what the file contains (e.g. Excel files are XLSX, Word is DOCX, SAS programs are SAS etc.).  .D is a file association with the D programming language, so naming your file this way can cause unusual effects.  If this is a text based delimited file, then .txt or .dlm or .csv (for comma delimited) should be used.

 

For a SAS based solution (note, updated this slightly from the SAS support provided code from post:

https://communities.sas.com/t5/General-SAS-Programming/find-carriage-return-in-dataset/m-p/177855/hi...

Not tested it, but should work fine.

 

data _null_;
     infile "s:\temp\rob\x.csv" recfm=n;
     file "s:\temp\rob\y.csv" recfm=n;
     retain flag 0;
     input a $char1.;
     if a = '"' then flag=ifn(flag=0,1,0);
     if not(flag=1 and a in ('0D'x,'0A'x)) then put a $char1.;
run;

This reads in the file bit by bit, and drops carriage return characters from the the file.

 

Occasional Contributor
Posts: 7

Re: How to read a .d file with long character across multiple line?

Unfortunately the file that I download from the Progress system is a .d file

 

I cannot change it to .csv or .txt  >..<

Super User
Posts: 7,762

Re: How to read a .d file with long character across multiple line?

After downloading, you can rename the file on your computer to any extension you want, including .this_is_the_silliest_extension_I_could_come_up_with.


hinear wrote:

Unfortunately the file that I download from the Progress system is a .d file

 

I cannot change it to .csv or .txt  >..<


 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 345

Re: How to read a .d file with long character across multiple line?

data test(keep= Var1 Var2);
   length 
      Var1 $ 4 Var2 $ 999
      buffer $ 32000
      rx 8
   ;

   retain 
      buffer ' ' 
      rx 
   ;

   if _n_ = 1 then do;
      rx = prxparse('/"(.+)".*"(.+)"/');
   end;
      
   infile "&ImportLocation.test.d" delimiter=' ' MISSOVER DSD lrecl=32767; 
   input;

   if lengthn(_infile_) = 0 then delete;

   if countc(_infile_, '"') = 3 then do;
      buffer = _infile_;
   end;
   else do;
      buffer = cats(buffer, _infile_);
   end;

   if mod(countc(buffer, '"'), 2) = 0 then do;
      if prxmatch(rx, buffer) then do;
         Var1 = prxposn(rx, 1, buffer);
         Var2 = prxposn(rx, 2, buffer);
         output;
      end;
   end;

run;

just another idea ....

Occasional Contributor
Posts: 7

Re: How to read a .d file with long character across multiple line?

Posted in reply to andreas_lds

Thank you very much, it is working. However, my data also contain some variables(number and date), which does not have open and close quotation mark.

 

For example: Content of the data (test.d)

"Row1" 123 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBC

CCCCCCCCCCCCCCCC"

 

"Row2" 234 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBC

CCCCCCCCCCCCCCCC"

 

How to apply the same logic on it?

Super User
Super User
Posts: 7,039

Re: How to read a .d file with long character across multiple line?

[ Edited ]

You should look at the file and see what it is actually using as the delimiter and as end of line markers.  You can easily use SAS to do this if you do not have a text editor that can do it for you.  For example to list the 1,000 bytes of your file you could use code like this.

data _null_;
   infile 'myfile.d' lrecl=100 recfm=f obs=10;
   input;
   list;
run;

If you are lucky your file will have <CR><LF> at the end of a record and just <CR> or just <LF> as the splits in the middle of the long lines. For example like this:

1213  data _null_;
1214    infile test lrecl=50 recfm=f obs=10;
1215    input;
1216    list;
1217  run;

NOTE: The infile TEST is:

      Filename=...
      RECFM=F,LRECL=50,File Size (bytes)=166,
      Last Modified=10Jul2017:10:17:30,
      Create Time=10Jul2017:10:17:30

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--
1         "Row1" 123 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA

2   CHAR  .BBBBBBBBBBBC.CCCCCCCCCCCCCCCC".."Row2" 234 24/09/
    ZONE  04444444444440444444444444444420025673223332332332
    NUMR  D222222222223D33333333333333332DA22F7220234024F09F

3   CHAR  2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA.BBBBBBBBBBBC.CCC
    ZONE  33332242444444444444444444444444404444444444440444
    NUMR  201902101111111111111111111111111D222222222223D333

4   CHAR  CCCCCCCCCCCCC".. 16
    ZONE  4444444444444200
    NUMR  33333333333332DA
NOTE: 4 records were read from the infile TEST.

So you could read it like this.

data want ;
  infile test dsd dlm=' ' truncover termstr=crlf ;
  length var1 $10 var2 var3 8 var4 $1000 ;
  informat var3 ddmmyy.;
  format var3 yymmdd10.;
  input var1-var4 ;
  var4 = translate(var4,'||','0D0A'x);
run;

Capture.PNG

Occasional Contributor
Posts: 7

Re: How to read a .d file with long character across multiple line?

Unfortunately, my data has <CR><LF> in the end of each row.

 

"Row1" 123 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA<CR><LF>
<CR><LF>
BBBBBBBBBBBC<CR><LF>
<CR><LF>
CCCCCCCCCCCCCCCC" <CR><LF>

 

"Row2" 234 24/09/2019 "A AAAAAAAAAAAAAAAAAAAAAAAAA<CR><LF>
<CR><LF>
BBBBBBBBBBBC<CR><LF>
<CR><LF>
CCCCCCCCCCCCCCCC" <CR><LF>

 

 

I used your SAS code, but the output look like the following:-

 

var1

var2

var3

var4

Row1

123

2019-09-24

"A

    

BBBBBBBBBB

   
    

CCCCCCCCCC

   
    
    
    

Row2

234

2019-09-24

"A

    

BBBBBBBBBB

   
    

CCCCCCCCCC

   
Super User
Super User
Posts: 7,039

Re: How to read a .d file with long character across multiple line?

[ Edited ]

You will first need to pre-process the file to remove the embedded CRLF characters.

The simple quote counting technique that @RW9 posted on this thread should do the trick for this file since the strings with the embedded CRLF are quoted.

Then you can read the fixed file using normal delimited file input syntax.

Occasional Contributor
Posts: 7

Re: How to read a .d file with long character across multiple line?

 

 

I finally contact sas support and they give a similar solution.

the method is replace the carriage return and linefeed characters inside the double quotation mark by '@' and '$'

 

http://support.sas.com/kb/26/065.html

 

 

Thank you for all of your help. Otherwise, I don't even notice that there are <CR><LF> inside the file.

Super User
Super User
Posts: 7,942

Re: How to read a .d file with long character across multiple line?

Which, you will see from my post above had already been supplied by support and posted above.

Super User
Posts: 10,020

Re: How to read a .d file with long character across multiple line?

Try option recfm=n .

 

data test;

%let _EFIERR_ = 0;

infile "&ImportLocation.test.d" delimiter=' ' MISSOVER DSD lrecl=3276700 recfm=n ;

informat

 Var1 $ 4.

 Var2 $ 999.;

format

Var1 $ 4.

Var2 $ 999.;

input

 Var1 $

Var2 $ @@;

run;

Ask a Question
Discussion stats
  • 16 replies
  • 243 views
  • 6 likes
  • 6 in conversation