BookmarkSubscribeRSS Feed
GKI
Fluorite | Level 6 GKI
Fluorite | Level 6

I'm importing a CSV file (see attached file "AKNZBP_test2.CSV" with encoding ansi) into SAS (session encoding UTF-8) using a data step with INFILE.
The ansi CSV file has a hex"FF" delimiter, so I used the options delimiter="~" encoding='ansi'
But the hex"FF" delimiter seems not to be recognized!
but wy? ok, hex"FF" is not a valid character in UTF-8 encoding!

my datastep for the impot is:

data work.AKNZBP_006;
infile "/PATH/AKNZBP_test2.CSV" delimiter="FF"x encoding='ansi';
Informat NSTAT $1.
 FA $1.
 IDVT $17.
 NLFD 2.0
 NTXT1 $77.
 NTXT2 $77.
 NTXT3 $77.
 NTXT4 $77.
 NTXT5 $77.
 NTXT6 $77.
 NTXT7 $77.
 NTXT8 $77.
 NTXT9 $77.
 NTXT10 $77.
 NTXT11 $77.
 NTXT12 $77.
 NTXT13 $77.
 _ANLD 6.0
 _AEND 6.0
 _SB $4.
 
;
Input NSTAT
 
FA
IDVT
NLFD
NTXT1
NTXT2
NTXT3
NTXT4
NTXT5
NTXT6
NTXT7
NTXT8
NTXT9
NTXT10
NTXT11
NTXT12
NTXT13
_ANLD
_AEND
_SB
;

Run;

If I change the delimiter form hex"FF" to "~" and use the infile options delimiter="~" encoding='ansi'
it works fine! I get a correct Import of the CSV File!
but is there a way to inprot the CSV file with the hex"FF" delimiter without changing it to another delimiter like "~"

thx 4 help

Günter

4 REPLIES 4
Tom
Super User Tom
Super User

Seems not. 

But you can change it on the fly by modifying _INFILE_ automatic variable. 

Strangely enough '00'x works.  Not sure why 'FF'x doesn't.

data test;
  dlm='00'x;
  infile 'c:\downloads\AKNZBP_test2.CSV' dsd dlm=dlm obs=3 truncover encoding=any;
  input @;
  put _infile_ $hex4.;
  _infile_=translate(_infile_,dlm,'FF'x);
  put _infile_ $hex4.;
  input (var1-var3) (:$30.);
  put (var:) (=:$quote.);
run;
31FF
3100
var1="1" var2="1" var3="00000000000036277"
31FF
3100
var1="1" var2="1" var3="00000000000036278"
31FF
3100
var1="1" var2="1" var3="00000000000036279"
NOTE: 3 records were read from the infile 'c:\downloads\AKNZBP_test2.CSV'.
      The minimum record length was 74.
      The maximum record length was 76.
NOTE: The data set WORK.TEST has 3 observations and 3 variables.

You can even ask it to figure out what byte it can use as the replacement delimiter so it could use a different one on every line. 

data work.AKNZBP_006;
  length dlm $1;
  infile "c:\downloads\AKNZBP_test2.CSV" delimiter=dlm truncover;
  length NSTAT $1 FA $1 IDVT $17 NLFD 8 NTXT1-NTXT13 $77 _ANLD 8 _AEND 8 _SB $4 ;
  input @;
  dlm=compress(collate(0,254),_infile_);
  _infile_=translate(_infile_,dlm,'FF'x);
  input NSTAT -- _SB;
run;
proc print; run;
SAS 9.4 on WINDOWS

                                                                  N  N  N  N
   N                               N      N  N  N  N  N  N  N  N  T  T  T  T     _       _
   S             I          N      T      T  T  T  T  T  T  T  T  X  X  X  X     A       A
O  T             D          L      X      X  X  X  X  X  X  X  X  T  T  T  T     N       E     _
b  A  F          V          F      T      T  T  T  T  T  T  T  T  1  1  1  1     L       N     S
s  T  A          T          D      1      2  3  4  5  6  7  8  9  0  1  2  3     D       D     B

1  1  1  00000000000036277  0  0 Prov.                                        951011  951011  0001
2  1  1  00000000000036278  0  15% Prov.                                      951011  951011  0001
3  1  1  00000000000036279  0  18% Prov.                                      951011  951011  0001
4  1  1  00000000000036280  0  3% Prov.                                       951011  951011  0001
5  2  1  00000000000036281  0  3% Prov.                                       951011  960122  0001

GKI
Fluorite | Level 6 GKI
Fluorite | Level 6

hej hej Tom,

 

thx, that seams to work well!

I have to test your solution with all our 63 datafiles!

so I use just a simple version for testing:

data work.AKNZBP_006;
dlm='00'x;
infile "/PATH/AKNZBP_test2.CSV" dlm=dlm encoding='ansi';
Informat NSTAT $1.
 FA $1.
 ...
;
Input @;
_infile_=translate(_infile_,dlm,'FF'x);
Input
NSTAT
FA
...;

Run;

BR

Günter

Tom
Super User Tom
Super User

Looks good. 

 

Not sure what value you are getting out of attaching informats to the variables.  I guess as a side effect it is forcing SAS to guess that you wanted to define the variables with lengths that are compatible with the widths you used on the attached informats.

GKI
Fluorite | Level 6 GKI
Fluorite | Level 6

hej hej Tom,

 

it works so far, but just for encoding='ansi'

every other encoding (like encoding='latin2') does not work!

 

also, in our data files we have german "ö , ä , ü ...." (not now in the sample file) and with the using of translate, this characters are not correctly converted to UTF8!

I have to check further, what is happening with this characters (hex bevor translate and after translate)

 

BR Günter

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3248 views
  • 0 likes
  • 2 in conversation