Solved: read in text file

runrunbunny · Posted 10-12-2021 07:46 PM

Hi everyone,

I have a text file including several columns are all about customers' comments like this:

FilePath Comments Score

0937bj903.wav I don't like your service at all balabala Negative

rt707bj690.wav Great service Positive

8897b66er.wav No idea Neutral

I have trouble reading this text file since the length of the column 'Comments' are not the same, so I am not able to define which column to start and end, which directly leads in mis-reading the column 'Score' as well. So I have to define the length of each column, but it's not working properly:

data test;

length FilePath $ 13 Comments $ 30 Score$ 10;

infile "data\file.txt" ;

input FilePath $ Comments $ Score $ ;

run;

Could you please give me some suggestions in how to dealing with this kind of text file?

Thank you in advance!

R.B

Tom · Posted 10-12-2021 08:15 PM

First is make sure that those spaces between the columns are not actually tab characters.

A simple way to check is to use the LIST statement in a data step. The '09'x hex codes of a TAB character will trigger the list statement to display the hex codes for you to examine. If they are all spaces then LIST will not both listing the hex codes.

data _null_;
  infile 'myfile' obs=5;
  input;
  list;
run;

If they are tabs then use '09'x as the delimiter.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt" dlm='09'x ;
  input FilePath  Comments Score ;
run;

Otherwise if SCORE is always only one word then just read everything into COMMENTS and move the last word into SCORE.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt"  truncover ;
  input FilePath  Comments $30. ;
  score=scan(comments,-1,' ');
  comments=substrn(comments,1,length(comments)-length(score));
run;

View solution in original post

Tom · Posted 10-12-2021 08:15 PM

First is make sure that those spaces between the columns are not actually tab characters.

A simple way to check is to use the LIST statement in a data step. The '09'x hex codes of a TAB character will trigger the list statement to display the hex codes for you to examine. If they are all spaces then LIST will not both listing the hex codes.

data _null_;
  infile 'myfile' obs=5;
  input;
  list;
run;

If they are tabs then use '09'x as the delimiter.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt" dlm='09'x ;
  input FilePath  Comments Score ;
run;

Otherwise if SCORE is always only one word then just read everything into COMMENTS and move the last word into SCORE.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt"  truncover ;
  input FilePath  Comments $30. ;
  score=scan(comments,-1,' ');
  comments=substrn(comments,1,length(comments)-length(score));
run;

runrunbunny · Posted 10-12-2021 09:18 PM

Thank you Tom! I really appreciate your solution.

All the best!

read in text file

Re: read in text file

Re: read in text file

Re: read in text file

Registration is open

read in text file

Re: read in text file

Re: read in text file

Re: read in text file

Registration is open

SAS Training: Just a Click Away