I don't understand...does your data file contain
and all the white space, or were those separate records that you were trying to indicate???
What about Date 3/12/09 and Date 3/16/09?? Are those IN the file or your descriptions of the data?? Are they separate files? What about the Received 3/14/09??? Again, part of the data or not??
Also, about the data...in your file, is there really a line break between 33182 and 27666 for example, or if you open the file with Notepad, do you see this.
97101, 97440, 67431, 55798, 42905, 22590, 31634, 19139, 20920, 33182, 27666, 27532, 25707, 27494
or is there a line break or carriage return at the end of 33182, as shown below:
You say you want one row per number -- by which I think you mean:
What would you call that variable Date_3_12_09??? or just NUM or maybe NUMVAR???? Would you want each observation to consist of only one variable or should each observation look like this (partial output):
You say that you have a "file" is this an ASCII text file (a file that you can open using Notepad)? Is this data in an Excel spreadsheet??
If you have a truly urgent need for help, then you should open a track with Tech Support. In order to help you here, folks really need to see the actual layout of the file you will be reading (with ------ and =====) if those are in the file, but without those cosmetics if they are not in the file.
Thank you for your quick reply. I do apologize for not being clearer.
This is a text file that can be viewed in notepad. It is generated by someone else. My task is to find the rows that start with a numeric character and then pull that row and all following rows till I run out of data then continue on till I find another row starting with numeric data then pull that data out. I don't need to keep any text or dates.
This is exactly what is in the data file. "Need to update" is the last line of the file.
For diagnostic review, suggest you add a PUTLOG _ALL_; statement to your program, you would see that the _ERROR_ variable is set to 1 when you cannot perform the input. Then you can perform some other/additional task with a DO/END code piece in your DATA step.
Then I use this program to read in 1 line of the file at a time. The INPUT statement uses $VARYING with a length variable in order to read the whole text line and treat each line as a separate entity. (The length variable, LG, comes from using the LENGTH= option on the INFILE statement.) Using $VARYING allows me to read variable length lines -- short lines and long lines without needing to know the exact length of each data line.
I test the first 4 bytes of the line to see whether it is in this list: Date, Need, Rece, Test or ---- or 4 blanks. I also test to see if the line is "empty" -- basically a carriage return without anything else on the line. If the dataline falls into these categories, it is just deleted and control returns to the top of the DATA step to read the next line.
When a dataline doesn't meet any of these conditions, it must be a line with numbers, so the COUNTC function finds out how many commas there are in the line and then the DO loop loops through the line and parses out each number into a character variable. Then a numeric variable is created by using the INPUT function (to turn a character variable into a numeric variable. Each parsed number is output -- so for example, the first data line with numbers contains 10 numbers and so 10 observations are created and the second data line with numbers contains 4 numbers and so 4 observations are created.
Once you understand what the program is doing, I would recommend uncommenting the DROP statement to get rid of unwanted variables. And, of course, you can tweak the program, as needed if you find that there are other stray text lines in the file.
length line $100;
** once you understand what program is doing, then drop unneeded variables;
** DROP line i charvar orig_line_num;
infile 'c:\temp\mixed_data.txt' length=lg;
input @1 line $varying. lg ;
orig_line_num = _n_;
** lg will equal 0 when there is an empty line in the file without any space characters;
if substr(line,1,4) in ('Date', 'Test', 'Rece', 'Need', '----', ' ') or lg=0 then do;
num_comma = countc(line,',');
** add 1 because sometimes there is NOT a comma at the end of a line of data;
** if you do an extra scan and there is nothing found, then charvar will be missing or blank;
** and the IF statement will keep that last value from being output.;
do i = 1 to num_comma + 1;
charvar = scan(line,i,',');
numvar = input(charvar,best12.);
if charvar ne ' ' then output;
proc print data=readmixed;
title 'Output Data Set';
*keep it simple
If a line starts with a number, output each number from that line, until there are no more numbers on the line[pre]data numbers ;
infile datalines4 dsd truncover; *comma delimiter by default ;
*get first data item of record/buffer as number ;
input number ?? @ ; *??=avoid invalid data messages, @=hold this buffer ;
do while( number ne . ); *do not output from record when non-numeric is read ;
input number ?? @ ; * get next number and hold buffer ;
* with a single trailing @, the infile buffer is released as data step iterates ;
If the data had come from some of my former customers, however, they would have been back a week later and said, "Hey, you know that file of numbers you gave us, it would be really swell if you could give us the dates that were in that original text file, too." That's how I got into the habit of coding in anticipation of what their next request was going to be. ;-)
that should require very little more
Before the Datalines statement, insert[pre]
if _infile_ =: 'Date' ;
input @6 set_date : mmddyy. ;
retain set_date ;
attrib set_date format= date9. ;
datalines4 statement should have no 4 Message was edited by: Peter.C