I'm new to SAS programming, and have been struggling to figure out how to write the code to import a .txt file into a useful format. The .txt file has hundreds of pages and each page reprints the header information. The date of the text file is in the header, and I would like to put that in the data table. I've done some searches and found bits of code that would help with some parts, but haven't been able to put it all together. Could anyone show me some code that could turn something that looks like this:
TITLE
SUBTITLE
DATE: 02/01/2014
NAME RUNNERID TIME
Bob Smith 48739 0300:48
Nick Jones 48372 0289:57
...
into this:
DATE NAME RUNNERID TIME
02/01/2014 Bob Smith 48739 0300:48
02/01/2014 Nick Jones 48372 0289:57
...
Keeping in mind that there are many pages of this with the Title repeated at the top, and I only need the raw data with the date column added?
If your description is correct then you should be able to use something like:
data have;
informat date mmddyy10.;
format date mmddyy10.;
informat name $40.;
informat runnerid 5.;
informat time time7.;
format time time7.;
retain date;
input @;
if notdigit(substr(_infile_,1,1)) then do;
input name & runnerid time;
output;
end;
else if index(_infile_,'DATE:') then do;
input @'DATE: ' date;
end;
else input;
cards;
1 TITLE
5 SUBTITLE
5 DATE: 02/01/2014
1
1NAME RUNNERID TIME
Bob Smith 48739 0300:48
Nick Jones 48372 0289:57
1 TITLE
1 SUBTITLE
1 DATE: 02/01/2014
1
1NAME RUNNERID TIME
John Smith 58739 0300:48
Mary Jones 58372 0289:57
;
I should've also mentioned: Every row that is not a data row has a number in the first character of each row to denote that it's a header row. I figured that piece could be helpful for the logic needed to sort this mess out. I appreciate any ideas you might have!
You may also want to describe that time value a bit, at least I'm not getting exactly how to interpret though from context I am guessing it is an elapsed time of some sort. That information will be helpful to create SAS time values which are often quite useful when it comes to analysis.
If you could paste some complete data lines, including the row codes you mention it would help.
Unfortunately, I can't share the actual data since it's proprietary, but I dummied up the example above with similar field types to be able to answer my question. The time field above is in MMMM:SS format similar to how a runner's time might look if they didn't use an hour field. In case it's relevant, the actual field I'm using is in HHHH:MM format. All other fields that I am using in the input are text except for the date field I want to include.
Do the numbers at the start of the title, subtitle and date stay the same or vary with the page? If they don't vary then the values can be very helpful but are not shown in your example data.
There is some variation, but it's predictable, such as "5" is always the first character in a line that include a header, but "5" is also the first character in the line that includes the date. I think for purposes of this example, assuming that all lines with data have a blank, and all lines without data (including the headers and the date that I want to include in the table) have the character "5" as the first character in their row in the text file.
If your description is correct then you should be able to use something like:
data have;
informat date mmddyy10.;
format date mmddyy10.;
informat name $40.;
informat runnerid 5.;
informat time time7.;
format time time7.;
retain date;
input @;
if notdigit(substr(_infile_,1,1)) then do;
input name & runnerid time;
output;
end;
else if index(_infile_,'DATE:') then do;
input @'DATE: ' date;
end;
else input;
cards;
1 TITLE
5 SUBTITLE
5 DATE: 02/01/2014
1
1NAME RUNNERID TIME
Bob Smith 48739 0300:48
Nick Jones 48372 0289:57
1 TITLE
1 SUBTITLE
1 DATE: 02/01/2014
1
1NAME RUNNERID TIME
John Smith 58739 0300:48
Mary Jones 58372 0289:57
;
Is there a way to do that using the "INFILE" function to pull in a .txt file rather than using cards? The format of the data is very similar to the sample you show in the cards, but it's hundreds of pages long.
Of course! Just insert an infile statement and delete the cards and datalines. I usually put it in the code early, like the 2nd line or so.
The code should work the same way.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.