Help using Base SAS procedures

Extract data from a unfriendly text file

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Extract data from a unfriendly text file

[ Edited ]

Hi ,

 

I have a unfriendly text file like below:

 

Math Exam
Student Report
Calculus I
Fall 2016
Student: Eric
ID: 1234567
Date of Birth: 7/1/1990
Grade: 95
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison

Description : This is an easy exam

 

Math Exam
Student Report
Calculus I
Fall 2016
Student: Amy
ID: 1234519
Date of Birth: 12/18/1990
Grade: 77
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison

Description : This is a real tough exam


Math Exam
Student Report
Calculus I
Fall 2016
Student: John
ID: 1234569
Date of Birth: 3/1/1991
Grade: 90
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison

Description : This is an easy exam

 

Math Exam
Student Report
Calculus I
Fall 2016
Student: Ava
ID: 1234523
Date of Birth: 9/13/1992
Grade: 89
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison

Description : This is a not a hard exam

 

----------------------------

I have read tons of papers to extract data from text file, like https://communities.sas.com/t5/SAS-Procedures/Extract-data-from-text-file/td-p/278012. I tried their ways,  but still have no idea how to extract

1 course name

2 student name

3 student id number 

4 student date of birth 

5 grade

6 report date 

from such text file, can anyone please help. 

 

 -------------------

my final goal is to obtain a dataset like below 

Calculus I   Eric   1234567  7/1/1990       95 12/20/2016

Calculus I   Amy  1234519  12/18/1990   77 12/20/2016

Calculus I   John  1234569  3/1/1991      90 12/20/2016

Calculus I   Ava   1234523   9/13/1992    89 12/20/2016


Accepted Solutions
Solution
‎10-02-2017 09:44 PM
Super User
Super User
Posts: 8,112

Re: Extract data from a unfriendly text file

Define your variables. Don't force SAS to guess what you meant them to be by how they first appear in the code.

Don't bother with all of those nested IF/THEN statements. Let the natural SAS processing do that work for you. Take advantage of the power of the INPUT statement.  

data temp;
  infile 'exam.txt' truncover dsd dlm=':';
  length Course $30 StudentName $30 Grade Reportdate 8 ID $10 SchoolCode $8 School $30 DoB 8 District $30 ;
  format ReportDate DoB mmddyy10. ;
  retain Course -- District; 
  length chk $100 ;
  input chk @; 
  if chk = 'Calculus' then input Course ;
  if chk = 'Student' then input StudentName ;
  if chk = 'ID' then input ID ;
  if chk = 'Grade' then input Grade ;
  if chk = 'Date of Birth' then input Dob mmddyy12. ;
  if chk = 'School' then input School ;
  if chk = 'District' then input District ;
  if chk ='Report Date' then do; 
    input Reportdate mmddyy12. ;
    schoolcode =scan(school,1,' ');
    school = substr(school,length(schoolcode)+2);
    output;
    call missing(of _all_);
  end;
run;

Note that SAS can normally read date values with 2 digit years and make the right choice is to what century is meant, but why would want to print your dates without the century?  You are just inviting trouble later.

View solution in original post


All Replies
Super User
Posts: 23,724

Re: Extract data from a unfriendly text file

Do the rows repeat at all, or are you reading a single record from a file?

I think you need to provide more information.

 

You can read from a multi lines in SAS using an INPUT statement.

Occasional Contributor
Posts: 8

Re: Extract data from a unfriendly text file

Hi, I add more detailed information about this.

 

There are multiple students in the text file. I listed four students just as an example.

 

 

Super User
Posts: 23,724

Re: Extract data from a unfriendly text file

Here's the mechanism for how you'd do this:

https://stats.idre.ucla.edu/sas/code/reading-a-multiple-line-per-subject-data-file/

 

If you've tried something make sure to include it, otherwise that will likely be the first suggestion you get and it's a waste of everyones time. 

Super Contributor
Super Contributor
Posts: 266

Re: Extract data from a unfriendly text file

@Reeza That's an excellent reference.  Saving that in case i need it!

 

@@@Wei2017  That text looks a lot like the structure of an XML document without the tags.  If whereever you are getting that text dump from could give you that as XML instead, it might read easier.

 

<exam_record>

<examtype>Math</examtype>
<reporttype>Student</reporttype>
<subject>Calculus I</subject>
<date>Fall 2016</date>
<studentname>Eric</studentname>
<studentid>1234567</studentid>
<dob>7/1/1990</dob>
<grade>95</grade>

<reportdate>12/20/2016</reportdate>
<school>123 Edison</school>
<district>123 Edison</district>

<description>This is an easy exam</description>

 </exam_record>

 

 

Occasional Contributor
Posts: 8

Re: Extract data from a unfriendly text file

Thank you for your link, but that is not what i am looking for. 

I have tried https://communities.sas.com/t5/SAS-Procedures/Extract-data-from-text-file/td-p/278012, it is a little bit similar to my question.

Regular Contributor
Posts: 213

Re: Extract data from a unfriendly text file

The link posted by @Reeza has exactly what you need to solve the issue. One obersavation in your requested output dataset has the data of multiple lines from your source file. 

Super User
Posts: 23,724

Re: Extract data from a unfriendly text file

If you don't post what you've tried then you're going to get suggestions that may not work. 

And it should work so you're likely doing something wrong, but we can't tell you since you haven't shown us any code. 

Or if you did do it correctly it would give us something to start with rather than starting from scratch. 

 


Wei2017 wrote:

Thank you for your link, but that is not what i am looking for. 

I have tried https://communities.sas.com/t5/SAS-Procedures/Extract-data-from-text-file/td-p/278012, it is a little bit similar to my question.


 

Occasional Contributor
Posts: 8

Re: Extract data from a unfriendly text file

[ Edited ]

Apologise for not posting my code and result part. 

 

@Reeza @HB @Tom @ballardw @error_prone

 

Here is my code and what i get in sas:

 Result.png

 

data temp;
infile 'exam.txt';
retain Course StudentName Grade Reportdate ID School DoB District;
input;
if _infile_ =: 'Calculus' then Course = scan(_infile_, 1, '\\');
  else if _infile_ =: 'Student' then StudentName = scan(_infile_, -1, ':');
   else if _infile_ =: 'ID' then ID = scan(_infile_, -1, ':');
    else if _infile_ =: 'Grade' then Grade = scan(_infile_, -1, ':');
     else if _infile_ =: 'Date of Birth' then Dob = scan(_infile_, -1, ':');
      else if _infile_ =: 'School' then School = scan(_infile_, -1, ':');
       else if _infile_ =: 'District' then District = scan(_infile_, -1, ':');
          else if (_infile_)=:'Report Date' then do 
            Reportdate = scan(_infile_,-1,':');
         output;
         end;
run;

proc print data = temp;
run;

 

I am still get lost in sas.

(1) Why is there something missing in the first row?

(2) How to convert Reportdate format into mmddyy? eg convert 12/20/2016 to 12/20/16

(3) Is there some ways that could split School into School code and School name?  eg "123 Edision" into "123" and "Edison"

 

Thank you for your time reading this. 

Solution
‎10-02-2017 09:44 PM
Super User
Super User
Posts: 8,112

Re: Extract data from a unfriendly text file

Define your variables. Don't force SAS to guess what you meant them to be by how they first appear in the code.

Don't bother with all of those nested IF/THEN statements. Let the natural SAS processing do that work for you. Take advantage of the power of the INPUT statement.  

data temp;
  infile 'exam.txt' truncover dsd dlm=':';
  length Course $30 StudentName $30 Grade Reportdate 8 ID $10 SchoolCode $8 School $30 DoB 8 District $30 ;
  format ReportDate DoB mmddyy10. ;
  retain Course -- District; 
  length chk $100 ;
  input chk @; 
  if chk = 'Calculus' then input Course ;
  if chk = 'Student' then input StudentName ;
  if chk = 'ID' then input ID ;
  if chk = 'Grade' then input Grade ;
  if chk = 'Date of Birth' then input Dob mmddyy12. ;
  if chk = 'School' then input School ;
  if chk = 'District' then input District ;
  if chk ='Report Date' then do; 
    input Reportdate mmddyy12. ;
    schoolcode =scan(school,1,' ');
    school = substr(school,length(schoolcode)+2);
    output;
    call missing(of _all_);
  end;
run;

Note that SAS can normally read date values with 2 digit years and make the right choice is to what century is meant, but why would want to print your dates without the century?  You are just inviting trouble later.

Occasional Contributor
Posts: 8

Re: Extract data from a unfriendly text file

[ Edited ]

Thank you for your code and exmplanation. The code is so awesome. I impleneted in my jupyter notebook, why some part is still missing like SchoolCode for Eric?  

 

Thank you.

 

Result_1.png

Super User
Super User
Posts: 8,112

Re: Extract data from a unfriendly text file


Wei2017 wrote:

Thank you for your code and exmplanation. The code is so awesome. I impleneted in my jupyter notebook, why some part is still missing like SchoolCode for Eric?  

 

Thank you.

 

 


Most likely the line with the SCHOOL on it is formatted slightly differently or does not exist for that block of lines. Watch out for changes in case, extra spaces, or things like tabs and other characters that look like spaces but aren't.

Regular Contributor
Posts: 213

Re: Extract data from a unfriendly text file

Can you explain "The expected results like below :" further? Do you need a dataset, a report in the posted format, the result printed in the log?

Occasional Contributor
Posts: 8

Re: Extract data from a unfriendly text file

Posted in reply to error_prone

Yes. 

my final goal is to obtain a dataset like below 

Calculus I   Eric   1234567  7/1/1990       95 12/20/2016

Calculus I   Amy  1234519  12/18/1990   77 12/20/2016

Calculus I   John  1234569  3/1/1991      90 12/20/2016

Calculus I   Ava   1234523   9/13/1992    89 12/20/2016

 

I have tried example like https://communities.sas.com/t5/SAS-Procedures/Extract-data-from-text-file/td-p/278012, but still have no idea how to deal with this one. Thank you.

Super Contributor
Super Contributor
Posts: 266

Re: Extract data from a unfriendly text file

[ Edited ]

EDIT:  I don't know why this answer formatted this way, with no breaks in the SAS code and a scroll on the last piece of text. 

 

 

Given a text file like:

Math Exam
Student Report
Calculus I
Fall 2016
Student: Eric
ID: 1234567
Date of Birth: 7/1/1990
Grade: 95
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison
Description : This is an easy exam
Math Exam
Student Report
Calculus I
Fall 2016
Student: Amy
ID: 1234519
Date of Birth: 12/18/1990
Grade: 77
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison
Description : This is a real tough exam
Math Exam
Student Report
Calculus I
Fall 2016
Student: John
ID: 1234569
Date of Birth: 3/1/1991
Grade: 90
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison
Description : This is an easy exam
Math Exam
Student Report
Calculus I
Fall 2016
Student: Ava
ID: 1234523
Date of Birth: 9/13/1992
Grade: 89
Report Date: 12/20/2016
School: 123 Edison
District: 123 Edison
Description : This is a not a hard exam

SAS code like

data incoming_text;
infile 'J:\some location where your file is\unfriendly_text_file.txt';
input 
exam_type & $20.

#2 report_type & $20.

#3 subject & $20.

#4 mydate & $20.
#5 @10 student & $20.
#6 @5 id & $20.
#7 @16 dob & $20.
#8 @8 grade & $20.
#9 @14 reportdate & $20.
#10 @9 school & $20.
#11 @11 district & $20.
#12 @15 description & $45
.;
run;

will give you

exam_type report_type subject mydate student id dob grade reportdate school district description
Math Exam Student Report Calculus I Fall 2016 Eric 1234567 7/1/1990 95 12/20/2016 123 Edison 123 Edison This is an easy exam
Math Exam Student Report Calculus I Fall 2016 Amy 1234519 12/18/1990 77 12/20/2016 123 Edison 123 Edison This is a real tough exam
Math Exam Student Report Calculus I Fall 2016 John 1234569 3/1/1991 90 12/20/2016 123 Edison 123 Edison This is an easy exam
Math Exam Student Report Calculus I Fall 2016 Ava 1234523 9/13/1992 89 12/20/2016 123 Edison 123 Edison This is a not a hard exam

 

 

 

You can mess around with the read  instructions and read only the lines you want or just read everything and drop the variables you don't want.  You can also read things properly as dates and numbers and not generic character variables of length 20 as I have.

 

Maybe like

#6 @5 id & 7.
#7 @16 dob:mmddyy10.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 526 views
  • 2 likes
  • 6 in conversation