08-05-2015 05:56 PM
Where to start?
Input is a single text file with an unknown number of observations.
each obs has 6 header cards,
followed by a variable number of data cards,
Then that repeats for many obs.
The format for the header cards (colon delimited) is different from the data cards (comma, delimited).
I need to input a card, see what card it is to extract the needed data.
Output 1 obs per data card that also includes any header data.
Report for: name is here
subject name: name
serial number: 23456789
Algorithm: john or Jane
list of column names, csv style
data (csv style)
data (csv style)
data (csv style) varable number of data cards
REPEAT THE BLOCK
08-05-2015 06:48 PM
Something like this may get you started;
infile <all of your infile information goes here include END=Fileend>;
length ReportName SubjectName $ 50 SerialNumber $ 9 Algorithm $ 50; /* since serial numbers usually aren't used in computations I treat them as character, also helps when you get letters or - / and such*/
retain ReportName SubjectName SerialNumber Algorithm ; /* so the header information is with each record*/
/* _infile_ below is the input buffer, you can examine it before input*/
do until (index(_infile_,"Report for") > 0);
end; /* starts reading until it finds the first line of your record now you should be able to read the header info NOTE: THE STRINGS FOLLOWING @ BELOW MUST BE EXACTLY AS THEY APPEAR IN THE DATA FOR CASE AND SPACES BETWEEN WORDS*/
input @'Report for: ' ReportName $50. /
@'subject name: '; SubjectName $50. /
@'serial number: ' SerialNumber $9. /
@'Algorithm: ' Algorithm $50. //;
/* your csv data should start here*/
do while (index(_infile_,"Report for:") = 0 and not fileend) ;
< your CSV input statement goes here>;
08-05-2015 09:33 PM
Thanks for your post. It has some very interesting items. I especially like the INPUT @'string' that searches and inputs at the same time.
This is what I was working on in the mean time:
INFILE 'M:\temp\all.dat' DLM=',' DSD LRECL=256 PAD MISSOVER;
LENGTH SleepAlgorithm $ 12;
INFORMAT InBedDate OutBedDate Onsetdate MMDDYY10.;
FORMAT InBedDate OutBedDate Onsetdate MMDDYY10.;
LENGTH InBedTime OutBedTime OnsetTime $ 10;
RETAIN line line1-line6;
INPUT line $ 1-256 @; *** input card and hold in buffer ***;
*** header lines get re-read here ***;
IF INDEX(line, "Sleep Report for:")>0 THEN INPUT line1 $ 1-256;
IF INDEX(line, "Subject Name:")>0 THEN INPUT line2 $ 1-256;
IF INDEX(line, "Serial Number:")>0 THEN INPUT line3 $ 1-256;
IF INDEX(line, "Sleep Algorithm:")>0 THEN INPUT line4 $ 1-256;
IF line="" THEN INPUT line5 $ 1-256; * blank line*;
IF INDEX(line, "In Bed Date")>0 THEN INPUT line6 $ 1-256; * column names line *;
*** data lines get re-read here ***;
IF INDEX(line, "Cole-Kripke,")>0 OR INDEX(line,"Sadeh,")>0 THEN DO;
INPUT @1 SleepAlgorithm InBedDate InBedTime OutBedDate OutBedTime OnsetDate OnsetTime Latency TotalCounts Efficiency TotalMinutesinBed TotalSleepTime WakeAfterSleepOnset NumberofAwakenings AverageAwakeningLength;
The final hang up was how SAS holds lines. I finally discovered the column pointer was at the end of the held line and my primary data Input statement was trying to read starting at the end which resulted in all missing data (with the correct number of output obs). thanks to my using the PAD and MISSOVER INFILE options.
Using the INFILE COLUMN= and LINE= options and PUT (to the log) help me track this problem down.
Adding the @1 resets the column pointer in the current buffer line.
The earlier INPUT line(s) used specified columns so they didn't rely on the column pointer.
My method means reading each line twice so it's probably not as fast but this file is relatively small.