Hi @Tom @Reeza ,
I have the attached file in .csv ..
I have some embedded carriage return Line Feed in the CSV so I am using the above code .
Then I noticed my " ' " in my input file are '’' . How can I correct that initially ?
or can i correct it after the data has een read in proc import in another data step.
Appreciate any assistance.
The highlighted colour is something I tried but no luck
the new test file name is _test_1
*****************************************--------------------------*********************;
%let repA=' '; /* replacement character LF */ %let repD=' '; /* replacement character CR */ %let repE="'"; %let dsnnme="*.csv"; /* use full path of CSV file */ data _null_; /* RECFM=N reads the file in binary format. The file consists */ /* of a stream of bytes with no record boundaries. SHAREBUFFERS */ /* specifies that the FILE statement and the INFILE statement */ /* share the same buffer. */ infile &dsnnme recfm=n sharebuffers; file &dsnnme recfm=n; /* OPEN is a flag variable used to determine if the CR/LF is within */ /* double quotes or not. Retain this value. */ retain open 0; input a $char1.; /* If the character is a double quote, set OPEN to its opposite value. */ if a = '"' then open = ^(open); /* If the CR or LF is after an open double quote, replace the byte with */ /* the appropriate value. */ if open then do; if a = '0D'x then put &repD; else if a = '0A'x then put &repA; else if a='’' then put &repE; end; run; /*STEP 2 ; */ proc import /* CSV */ datafile=".csv" out=test dbms=csv replace; delimiter=','; guessingrows=32767; run;
Hi @Tom @Reeza , I have the attached file in .csv .. I have some embedded carriage return Line Feed in the CSV so I am using the above code . Then I noticed my " ' " in my input file are '’' . How can I correct that. Appreciate any assistance.
Try
VARNAME = prxchange("s/’/'/",-1,REASON_FOR_SCORE);
Also, use capitals for a purpose. They increase legibility if used properly.
I use them for user-defined names.
This is easier to read
data A;
set TEST ;
keep SURVEYID REASON_FOR_SCORE VARNAME ;
VARNAME = translate(REASON_FOR_SCORE,"'","’");
run;
than this
data a;
set test ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","’");
run;
1. You are reading an Unicode UTF-8 file.
2. The test you are attempting to make will always fail since variable A has a length of 1.
3.The best option if you want to create an ASCII data set is translating the multi-byte characters into ASCII characters as they are loaded (or in your case, after the proc import), using a logic similar to the one you are attempting now. Hoping there's no Chinese or mathematical characters or 20 different quote types in the file.
Additional notes:
4. You could use option encoding= to pre-process the file properly, but not when using option recfm=n because N means that only one byte is read at a a time.
5. To conserve the data as is when reading the fixed file, you can use
filename extfile 'external-file' encoding="utf-8";
to read the data correctly. You should probably use the same option for your data set, as your SAS session is probably wlatin1. Check this. You still won't be able to display the data properly though, since wlatin1 does not allow multi-byte characters.
6. For what it's worth, UTF-8 is here to stay, so my opinion is: your organisation should start using UTF-8 environments and move away from ASCII-derived ones.
7. So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step.
@ChrisNZ ,
Thanks much for the advice .
I am doing what you have mentioned as Point 7 now :
"So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step."
only now there is a slight formatting issue . Can you please give any suggestion on this . I have pasted my code and output
%let repA=' '; /* replacement character LF */ %let repD=' '; /* replacement character CR */ %let dsnnme="_test_1.csv"; /* use full path of CSV file */ data _null_; /* RECFM=N reads the file in binary format. The file consists */ /* of a stream of bytes with no record boundaries. SHAREBUFFERS */ /* specifies that the FILE statement and the INFILE statement */ /* share the same buffer. */ infile &dsnnme recfm=n sharebuffers; file &dsnnme recfm=n; /* OPEN is a flag variable used to determine if the CR/LF is within */ /* double quotes or not. Retain this value. */ retain open 0; input a $char1.; /* If the character is a double quote, set OPEN to its opposite value. */ if a = '"' then open = ^(open); /* If the CR or LF is after an open double quote, replace the byte with */ /* the appropriate value. */ if open then do; if a = '0D'x then put &repD; else if a = '0A'x then put &repA; end; run; /*STEP 2 ; */ proc import /* CSV */ datafile="_test_1.csv" out=test dbms=csv replace; delimiter=','; guessingrows=32767; run; /* new steps */ data a; set test ; keep surveyid REASON_FOR_SCORE varname ; varname = translate(REASON_FOR_SCORE,"'","’"); run;
Below is the output I get .. There is a space after the " ' " . Is there anything to not have that extra space
Try
VARNAME = prxchange("s/’/'/",-1,REASON_FOR_SCORE);
Also, use capitals for a purpose. They increase legibility if used properly.
I use them for user-defined names.
This is easier to read
data A;
set TEST ;
keep SURVEYID REASON_FOR_SCORE VARNAME ;
VARNAME = translate(REASON_FOR_SCORE,"'","’");
run;
than this
data a;
set test ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","’");
run;
Or if you want to avoid regular expressions, something like this
VARNAME = compress(translate(REASON_FOR_SCORE, "'~~", "’"), '~');
Thanks Chris
VARNAME = compress(translate(REASON_FOR_SCORE, "'~~", "’"), '~');
didn't quite understand the highlighted part. can anyone please explain.
You need to replace 3 characters with 3 characters, then you remove the 2 extraneous ones.
Regular expressions (chosen solution) are more straightforward if you don't mind the more complex syntax.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.