Hi everyone,
I have a text file including several columns are all about customers' comments like this:
FilePath Comments Score
0937bj903.wav I don't like your service at all balabala Negative
rt707bj690.wav Great service Positive
8897b66er.wav No idea Neutral
I have trouble reading this text file since the length of the column 'Comments' are not the same, so I am not able to define which column to start and end, which directly leads in mis-reading the column 'Score' as well. So I have to define the length of each column, but it's not working properly:
data test;
length FilePath $ 13 Comments $ 30 Score$ 10;
infile "data\file.txt" ;
input FilePath $ Comments $ Score $ ;
run;
Could you please give me some suggestions in how to dealing with this kind of text file?
Thank you in advance!
R.B
First is make sure that those spaces between the columns are not actually tab characters.
A simple way to check is to use the LIST statement in a data step. The '09'x hex codes of a TAB character will trigger the list statement to display the hex codes for you to examine. If they are all spaces then LIST will not both listing the hex codes.
data _null_;
infile 'myfile' obs=5;
input;
list;
run;
If they are tabs then use '09'x as the delimiter.
data test;
length FilePath $ 13 Comments $ 30 Score$ 10;
infile "data\file.txt" dlm='09'x ;
input FilePath Comments Score ;
run;
Otherwise if SCORE is always only one word then just read everything into COMMENTS and move the last word into SCORE.
data test;
length FilePath $ 13 Comments $ 30 Score$ 10;
infile "data\file.txt" truncover ;
input FilePath Comments $30. ;
score=scan(comments,-1,' ');
comments=substrn(comments,1,length(comments)-length(score));
run;
First is make sure that those spaces between the columns are not actually tab characters.
A simple way to check is to use the LIST statement in a data step. The '09'x hex codes of a TAB character will trigger the list statement to display the hex codes for you to examine. If they are all spaces then LIST will not both listing the hex codes.
data _null_;
infile 'myfile' obs=5;
input;
list;
run;
If they are tabs then use '09'x as the delimiter.
data test;
length FilePath $ 13 Comments $ 30 Score$ 10;
infile "data\file.txt" dlm='09'x ;
input FilePath Comments Score ;
run;
Otherwise if SCORE is always only one word then just read everything into COMMENTS and move the last word into SCORE.
data test;
length FilePath $ 13 Comments $ 30 Score$ 10;
infile "data\file.txt" truncover ;
input FilePath Comments $30. ;
score=scan(comments,-1,' ');
comments=substrn(comments,1,length(comments)-length(score));
run;
Thank you Tom! I really appreciate your solution.
All the best!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.