SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AnnLyn
Fluorite | Level 6

Hi All,

 

I have an csv file with around 9 fields, one of this field (field 7) is an free text column in the source system.

I.E., user can type an message in it, so this field has carriage return value, special char, quotes, alt+enter.

 

Below you can see one record. Most of the records are without CR in the text field and is one row in the csv file. But some are with and appear as below.

 

AnnLyn_0-1731949860203.png

 

how can i import it in a easy way (we are using standard SAS integration studio) in a file reader? or programming? is there an option which can help me? 

 

Thanks in advance.

 

regards,

Ann

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If the lines are ended with CR+LF, which is normal for a DOS/Windows file, and the values have only CR or LF then the file should work fine if you just make sure to use the TERMSTR=CRLF option on the INFILE statement.  If you want to use PROC IMPORT to guess how to read the file than make a fileref that points to the file so you can add the TERMSTR=CRLF option to the FILENAME statement.

 

If the embedded characters are exactly the same as the normal end of line characters then you will need to first preprocess the file to fix that.  When the values are quoted that can be done by just counting how many quotes you have seen.

 

Here is macro that will do that for you:  https://github.com/sasutils/macros/blob/master/replace_crlf.sas

 

 

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

This works

filename CSV "~/aa.txt";
data WANT;
  infile CSV dsd dlm=';' flowover termstr=crlf;
  informat A B C $8.;
  input A B C;
run;  

for file

"aa";"aa
aa";"zz"
A B C
aa aa aa zz

 

 

ChrisNZ
Tourmaline | Level 20

Ah this in on Linux though. I suppose you're on Windows?

The EOL character is different.

Never mind.

You need to read one character at at a time.

ChrisNZ
Tourmaline | Level 20

Here's an example of how you can process the data as you read it.

filename CSV "~/aa.txt";

data _null_;
  file CSV;
  put '"aa";"bb' '0d0a'x 'bb";"cc"' '0d0a'x '"xx";"yy";"zz"';
  put '"11";"22";"33"';
run;

data WANT;
  infile CSV recfm=n eof=EOF;%* Read the file in stream mode;
  length A B C TMP $8 ;      %* Declare variables;
  retain A B TMP;            %* Retain variables
  input X $1. @@;            %* Save one character of data;
  if X='"' then do;          %* Look for quote in data;
    Q+1;                     %* Count quotes;
    if Q=2 then do;          %* Second quote: End of string; 
      VARNO+1;               %* Count variables ; 
      if VARNO=1 then A=TMP; %* Save value into variable;
      if VARNO=2 then B=TMP; %* Save value into variable;
      if VARNO=3 then do;    %* Last variable: Save record; 
        C=TMP;               %* Save value into variable;
        output;              %* Save record;
        VARNO=-1;            %* Reset variable counter;
      end;  
      else do;               %* Not last variable: skip to next data character;
        do until (X not in (';','"','0d'x,'0a'x));
          input X $1. @@;   
        end;  
        TMP=X;               %* Save first character of data;
      end;
      Q=1;                   %* Second quote processed, reset quote counter;
    end;  
  end;
  else 
    TMP=catt(TMP,X);        %* Character is not a quote: Append it into value;
  return;                   %* Go to top to read next character;
  eof:                      %* EOF found;
  stop;                     %* Stop the data step;
  keep A B C;
run;
proc print noobs;
run;  
  
A B C
aa bb bb cc
xx yy zz
11 22 33

 

Tom
Super User Tom
Super User

If the lines are ended with CR+LF, which is normal for a DOS/Windows file, and the values have only CR or LF then the file should work fine if you just make sure to use the TERMSTR=CRLF option on the INFILE statement.  If you want to use PROC IMPORT to guess how to read the file than make a fileref that points to the file so you can add the TERMSTR=CRLF option to the FILENAME statement.

 

If the embedded characters are exactly the same as the normal end of line characters then you will need to first preprocess the file to fix that.  When the values are quoted that can be done by just counting how many quotes you have seen.

 

Here is macro that will do that for you:  https://github.com/sasutils/macros/blob/master/replace_crlf.sas

 

 

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1114 views
  • 3 likes
  • 4 in conversation