BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
runrunbunny
Obsidian | Level 7

Hi everyone,

 

I have a text file including several columns are all about customers' comments like this:

 

 FilePath               Comments              Score

0937bj903.wav       I don't like your service at all balabala    Negative

rt707bj690.wav    Great service         Positive     

8897b66er.wav    No idea         Neutral

 

I have trouble reading this text file since the length of the column 'Comments' are not the same, so I am not able to define which column to start and end, which directly leads in mis-reading the column 'Score' as well. So I have to define the length of each column, but it's not working properly:

 

data test;

length FilePath $ 13   Comments $ 30    Score$ 10;

infile "data\file.txt" ;

input FilePath $  Comments $   Score $ ;

run;

 

Could you please give me some suggestions in how to dealing with this kind of text file?

 

Thank you in advance!

 

R.B

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

First is make sure that those spaces between the columns are not actually tab characters.

A simple way to check is to use the LIST statement in a data step.  The '09'x hex codes of a TAB character will trigger the list statement to display the hex codes for you to examine. If they are all spaces then LIST will not both listing the hex codes.

data _null_;
  infile 'myfile' obs=5;
  input;
  list;
run;

If they are tabs then use '09'x as the delimiter.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt" dlm='09'x ;
  input FilePath  Comments Score ;
run;

Otherwise if SCORE is always only one word then just read everything into COMMENTS and move the last word into SCORE.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt"  truncover ;
  input FilePath  Comments $30. ;
  score=scan(comments,-1,' ');
  comments=substrn(comments,1,length(comments)-length(score));
run;

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

First is make sure that those spaces between the columns are not actually tab characters.

A simple way to check is to use the LIST statement in a data step.  The '09'x hex codes of a TAB character will trigger the list statement to display the hex codes for you to examine. If they are all spaces then LIST will not both listing the hex codes.

data _null_;
  infile 'myfile' obs=5;
  input;
  list;
run;

If they are tabs then use '09'x as the delimiter.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt" dlm='09'x ;
  input FilePath  Comments Score ;
run;

Otherwise if SCORE is always only one word then just read everything into COMMENTS and move the last word into SCORE.

data test;
  length FilePath $ 13   Comments $ 30    Score$ 10;
  infile "data\file.txt"  truncover ;
  input FilePath  Comments $30. ;
  score=scan(comments,-1,' ');
  comments=substrn(comments,1,length(comments)-length(score));
run;
runrunbunny
Obsidian | Level 7

Thank you Tom! I really appreciate your solution.

 

All the best!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1230 views
  • 1 like
  • 2 in conversation