Hi,
I need to import a .dat file that contains this information: FILE DAT
A PER: TEST0 1232156000002024-02-072024-02-07 A PER: TEST2 12345678000002024-02-072024-02-07 A PER: TEST3 XXXXXXXX000002024-02-072024-02-07 A PER: N° Fattura 575000002024-02-072024-02-07
and I use this code:
filename test "path/test.dat";
data test;
length
ONE $40.
TWO $5.
three $10.
four $10.
;
infile test ;
input;
ONE =ksubstr(_infile_,1 , 41 );
TWO =ksubstr(_INFILE_,41 , 5 );
three =ksubstr(_INFILE_,46 , 10 );
four =ksubstr(_INFILE_,56 , 10 );
run;
I have a problem on the last line for the character ° because sas truncates the last character in the first column (missing 5, Fattura 575 is correct)
How can I import the file correctly?
Thanks,
Luca
Hi @luca87,
I think the truncation is just due to the insufficiently defined length of variable ONE:
@luca87 wrote:
data test; length ONE $40. TWO $5. three $10. four $10. ; infile test ; input; ONE =ksubstr(_infile_,1 , 41 ); TWO =ksubstr(_INFILE_,41 , 5 ); three =ksubstr(_INFILE_,46 , 10 ); four =ksubstr(_INFILE_,56 , 10 ); run;
With length $41 (no periods needed after length specifications) variable ONE should contain the missing character.
EDIT: This will also append an additional character to the values of ONE in the first three observations, though, causing an overlap with variable TWO. To avoid this overlap, use 40, not 41, in the third argument of the KSUBSTR function:
ONE=ksubstr(_infile_, 1, 40);
EDIT 2: To be on the safe side in case of more or longer multi-byte characters to be stored in variable ONE, just increase the defined length of the variable further (as this length is measured in bytes), but keep the 40 (i.e., 40 characters) in the KSUBSTR argument.
What encoding are your SAS session running in?
proc options option=ENCODING;
run;
And what is the encoding of your file?
If I copy your input data in my editor and import it using DATALINES, it looks correct (I'm using LATIN9 as encoding).
Hi!
The .dat file is in UTF-8.
SAS:
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
I add file .dat in the first topic.
Thanks,
Luca
Hi @luca87,
I think the truncation is just due to the insufficiently defined length of variable ONE:
@luca87 wrote:
data test; length ONE $40. TWO $5. three $10. four $10. ; infile test ; input; ONE =ksubstr(_infile_,1 , 41 ); TWO =ksubstr(_INFILE_,41 , 5 ); three =ksubstr(_INFILE_,46 , 10 ); four =ksubstr(_INFILE_,56 , 10 ); run;
With length $41 (no periods needed after length specifications) variable ONE should contain the missing character.
EDIT: This will also append an additional character to the values of ONE in the first three observations, though, causing an overlap with variable TWO. To avoid this overlap, use 40, not 41, in the third argument of the KSUBSTR function:
ONE=ksubstr(_infile_, 1, 40);
EDIT 2: To be on the safe side in case of more or longer multi-byte characters to be stored in variable ONE, just increase the defined length of the variable further (as this length is measured in bytes), but keep the 40 (i.e., 40 characters) in the KSUBSTR argument.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.