Hi community. I'm trying to parse an .fdf file that looks something like this: File.fdf ----------------------------------------------------------- askdjf;lk fdasfe qweiopqwur <</Contents(variable1)-akljsdfkj Page 1>><<xj/variable2-akljsdfkj Page 1>><</Contents(variable2) -akljsdfkj Page 4>>4564324<</Contents(variable2-akljsdfkj Page 2>><<Contents(ar/variable1)-akljsdfkj Page 3>> <</Contents(variable1) -akljsdfkj Page 4>> hjkasdfkjhsdfe ------------------------------------------------------- I basically want to recover (1) the variable names that are always in-between "Contents(" and the next ")", and (2) the corresponding page number that is always after "Page " and before the next ">>". I have been trying to use a DATA step with an INFILE. The issues I have been encountering are about how to read several VARIABLES and PAGES from a single _INFILE_ line, since I can make it work to read a single VARIABLE and PAGE per input line. This is compounded with the fact that the lines are way too long in the .fdf file (> 50,000 characters). Is there a way in which perhaps I can split the file directly from SAS into different lines? This is more or less what I have tried so far: SAS_SCRIPT ------------------------------------------------------------ data; infile "file.fdf" linesize=32767 N=10000; input; /*Gather occurrence for the loop*/ vars =countc(_infile_,"Contents("); pags=countc(_infile_,"Page "); pos=1; /* Start loop to find all occurrences of variables */ do i=0 TO min(vars,pags); pos_var = find(_infile_, "Contents(",pos)+9; if ( pos_var > 9 ) then do; pos = pos_var; length_var = find(_infile_, ")",pos_var) - pos_var; if ( length_var > ) then do; pos_page = find(_infile_,")",pos_var) + 5; if ( pos_page > 5 ) then do; length_page = find(_infile_, ">",pos_page) - pos_page; if ( length_page > 0 ) then do; /* Now input */ var = substr(_infile_, pos_var, length_var); page = substr(_infile_, pos_page, length_page); end; end; end; end; end; end; run; ------------------------------------------------------------ Expected output ------------------------------------------------------- name page variable1 1 variable1 2 variable 2 1 .... etc ------------------------------------------------------- Any ideas?
... View more