BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dennis_oz
Quartz | Level 8
Hi @Tom @Reeza  , 
I have the attached file in .csv ..
I have some embedded carriage return Line Feed in the CSV so I am using the above code .
Then I noticed my " ' "   in my input file are 
'’' . How can I correct that initially ?

or can i correct it after the data has een read in proc import in another data step.

Appreciate any assistance.
The highlighted colour is something I tried but no luck

the new test file name is _test_1
*****************************************--------------------------*********************;

%let repA=' '; /* replacement character LF */ %let repD=' '; /* replacement character CR */ %let repE="'"; %let dsnnme="*.csv"; /* use full path of CSV file */ data _null_; /* RECFM=N reads the file in binary format. The file consists */ /* of a stream of bytes with no record boundaries. SHAREBUFFERS */ /* specifies that the FILE statement and the INFILE statement */ /* share the same buffer. */ infile &dsnnme recfm=n sharebuffers; file &dsnnme recfm=n; /* OPEN is a flag variable used to determine if the CR/LF is within */ /* double quotes or not. Retain this value. */ retain open 0; input a $char1.; /* If the character is a double quote, set OPEN to its opposite value. */ if a = '"' then open = ^(open); /* If the CR or LF is after an open double quote, replace the byte with */ /* the appropriate value. */ if open then do; if a = '0D'x then put &repD; else if a = '0A'x then put &repA; else if a='’' then put &repE; end; run; /*STEP 2 ; */ proc import /* CSV */ datafile=".csv" out=test dbms=csv replace; delimiter=','; guessingrows=32767; run;

Hi @Tom @Reeza  , I have the attached file in .csv .. I have some embedded carriage return Line Feed in the CSV so I am using the above code . Then I noticed my " ' "   in my input file are '’' . How can I correct that.  Appreciate any assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Try

VARNAME = prxchange("s/’/'/",-1,REASON_FOR_SCORE);

Also, use capitals for a purpose. They increase legibility if used properly.

I use them for user-defined names.

This is easier to read

data A;
  set TEST  ;
  keep SURVEYID REASON_FOR_SCORE VARNAME ;
  VARNAME = translate(REASON_FOR_SCORE,"'","’");
run;

than this

data a;
set test  ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","’");

run;

 

View solution in original post

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

1. You are reading an Unicode UTF-8 file.

 

2.  The test you are attempting to make will always fail since variable A has a length of 1.

 

3.The best option if you want to create an ASCII data set is translating the multi-byte characters into ASCII characters as they are loaded (or in your case, after the proc import), using a logic similar to the one you are attempting now.  Hoping there's no Chinese or mathematical characters or 20 different quote types in the file.

 

Additional notes:

 

4. You could use option encoding= to pre-process the file properly, but not when using option recfm=n because N means that only one byte is read at a a time.  

 

5. To conserve the data as is when reading the fixed file, you can use

filename extfile 'external-file' encoding="utf-8"; 

to read the data correctly. You should probably use the same option for your data set, as your SAS session is probably wlatin1. Check this. You still won't be able to display the data properly though, since wlatin1 does not allow multi-byte characters. 

 

6. For what it's worth, UTF-8 is here to stay, so my opinion is: your organisation should start using UTF-8 environments and move away from ASCII-derived ones. 

 

7. So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step.

 

 

dennis_oz
Quartz | Level 8

 

@ChrisNZ  ,

Thanks much for the advice . 

 

I am doing what you have mentioned  as Point 7 now :

"So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step."

 

only now there is a slight formatting issue . Can you please give any suggestion on this . I have pasted my code and output

 

 

%let repA=' '; /* replacement character LF */
%let repD=' '; /* replacement character CR */
%let dsnnme="_test_1.csv"; /* use full path of CSV file */

data _null_;
      /* RECFM=N reads the file in binary format. The file consists */
      /* of a stream of bytes with no record boundaries. SHAREBUFFERS */
      /* specifies that the FILE statement and the INFILE statement */
      /* share the same buffer. */
      infile &dsnnme recfm=n sharebuffers;
      file &dsnnme recfm=n;

      /* OPEN is a flag variable used to determine if the CR/LF is within */
      /* double quotes or not. Retain this value. */
      retain open 0;
      input a $char1.;

      /* If the character is a double quote, set OPEN to its opposite value. */
      if a = '"' then
           open = ^(open);

      /* If the CR or LF is after an open double quote, replace the byte with */
      /* the appropriate value. */
      if open then
           do;
                 if a = '0D'x then
                      put &repD;
                 else if a = '0A'x then
                      put &repA;
		  end;
run;


/*STEP 2 ; */

proc import
      /* CSV */
      datafile="_test_1.csv" 
        out=test  dbms=csv replace;
      delimiter=',';
      guessingrows=32767;

run;

/* new steps */
data a;
set test  ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","’");

run;

Below is the output I get  .. There is a space after the " ' " . Is there anything to not have that extra space 

dennis_oz_0-1592265795776.png

 

 

ChrisNZ
Tourmaline | Level 20

Try

VARNAME = prxchange("s/’/'/",-1,REASON_FOR_SCORE);

Also, use capitals for a purpose. They increase legibility if used properly.

I use them for user-defined names.

This is easier to read

data A;
  set TEST  ;
  keep SURVEYID REASON_FOR_SCORE VARNAME ;
  VARNAME = translate(REASON_FOR_SCORE,"'","’");
run;

than this

data a;
set test  ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","’");

run;

 

ChrisNZ
Tourmaline | Level 20

Or if you want to avoid regular expressions, something like this

VARNAME = compress(translate(REASON_FOR_SCORE, "'~~", "’"), '~');

 

dennis_oz
Quartz | Level 8

Thanks Chris

 

VARNAME = compress(translate(REASON_FOR_SCORE, "'~~", "’"), '~');

 

didn't quite understand  the highlighted part. can anyone please explain.

ChrisNZ
Tourmaline | Level 20

You need to replace 3 characters with 3 characters, then you remove the 2 extraneous ones.

Regular expressions (chosen solution)  are more straightforward if you don't mind the more complex syntax.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3037 views
  • 0 likes
  • 2 in conversation