- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm importing a CSV file (see attached file "AKNZBP_test2.CSV" with encoding ansi) into SAS (session encoding UTF-8) using a data step with INFILE.
The ansi CSV file has a hex"FF" delimiter, so I used the options delimiter="~" encoding='ansi'
But the hex"FF" delimiter seems not to be recognized!
but wy? ok, hex"FF" is not a valid character in UTF-8 encoding!
my datastep for the impot is:
data work.AKNZBP_006;
infile "/PATH/AKNZBP_test2.CSV" delimiter="FF"x encoding='ansi';
Informat NSTAT $1.
FA $1.
IDVT $17.
NLFD 2.0
NTXT1 $77.
NTXT2 $77.
NTXT3 $77.
NTXT4 $77.
NTXT5 $77.
NTXT6 $77.
NTXT7 $77.
NTXT8 $77.
NTXT9 $77.
NTXT10 $77.
NTXT11 $77.
NTXT12 $77.
NTXT13 $77.
_ANLD 6.0
_AEND 6.0
_SB $4.
;
Input NSTAT
FA
IDVT
NLFD
NTXT1
NTXT2
NTXT3
NTXT4
NTXT5
NTXT6
NTXT7
NTXT8
NTXT9
NTXT10
NTXT11
NTXT12
NTXT13
_ANLD
_AEND
_SB
;
Run;
If I change the delimiter form hex"FF" to "~" and use the infile options delimiter="~" encoding='ansi'
it works fine! I get a correct Import of the CSV File!
but is there a way to inprot the CSV file with the hex"FF" delimiter without changing it to another delimiter like "~"
thx 4 help
Günter
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Seems not.
But you can change it on the fly by modifying _INFILE_ automatic variable.
Strangely enough '00'x works. Not sure why 'FF'x doesn't.
data test;
dlm='00'x;
infile 'c:\downloads\AKNZBP_test2.CSV' dsd dlm=dlm obs=3 truncover encoding=any;
input @;
put _infile_ $hex4.;
_infile_=translate(_infile_,dlm,'FF'x);
put _infile_ $hex4.;
input (var1-var3) (:$30.);
put (var:) (=:$quote.);
run;
31FF 3100 var1="1" var2="1" var3="00000000000036277" 31FF 3100 var1="1" var2="1" var3="00000000000036278" 31FF 3100 var1="1" var2="1" var3="00000000000036279" NOTE: 3 records were read from the infile 'c:\downloads\AKNZBP_test2.CSV'. The minimum record length was 74. The maximum record length was 76. NOTE: The data set WORK.TEST has 3 observations and 3 variables.
You can even ask it to figure out what byte it can use as the replacement delimiter so it could use a different one on every line.
data work.AKNZBP_006;
length dlm $1;
infile "c:\downloads\AKNZBP_test2.CSV" delimiter=dlm truncover;
length NSTAT $1 FA $1 IDVT $17 NLFD 8 NTXT1-NTXT13 $77 _ANLD 8 _AEND 8 _SB $4 ;
input @;
dlm=compress(collate(0,254),_infile_);
_infile_=translate(_infile_,dlm,'FF'x);
input NSTAT -- _SB;
run;
proc print; run;
SAS 9.4 on WINDOWS N N N N N N N N N N N N N N T T T T _ _ S I N T T T T T T T T T X X X X A A O T D L X X X X X X X X X T T T T N E _ b A F V F T T T T T T T T T 1 1 1 1 L N S s T A T D 1 2 3 4 5 6 7 8 9 0 1 2 3 D D B 1 1 1 00000000000036277 0 0 Prov. 951011 951011 0001 2 1 1 00000000000036278 0 15% Prov. 951011 951011 0001 3 1 1 00000000000036279 0 18% Prov. 951011 951011 0001 4 1 1 00000000000036280 0 3% Prov. 951011 951011 0001 5 2 1 00000000000036281 0 3% Prov. 951011 960122 0001
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hej hej Tom,
thx, that seams to work well!
I have to test your solution with all our 63 datafiles!
so I use just a simple version for testing:
data work.AKNZBP_006;
dlm='00'x;
infile "/PATH/AKNZBP_test2.CSV" dlm=dlm encoding='ansi';
Informat NSTAT $1.
FA $1.
...
;
Input @;
_infile_=translate(_infile_,dlm,'FF'x);
Input
NSTAT
FA
...;
Run;
BR
Günter
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks good.
Not sure what value you are getting out of attaching informats to the variables. I guess as a side effect it is forcing SAS to guess that you wanted to define the variables with lengths that are compatible with the widths you used on the attached informats.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hej hej Tom,
it works so far, but just for encoding='ansi'
every other encoding (like encoding='latin2') does not work!
also, in our data files we have german "ö , ä , ü ...." (not now in the sample file) and with the using of translate, this characters are not correctly converted to UTF8!
I have to check further, what is happening with this characters (hex bevor translate and after translate)
BR Günter