How to read in a dynamicly created file

robwork · Posted 03-21-2012 06:45 PM

I have been given a file that was created dynamically and I am unsure how to read it in. the file looks like:

HEADER RECORD

PERS:PERS1|MIGS ADD:P1A1|5 HIGH ST

END OF FILE

Any help would be gretly appreciated.

Thanks

art297 · Posted 03-21-2012 06:56 PM

Given your sample records, what would you want as a result?

robwork · Posted 03-21-2012 07:02 PM

Ideally it would look like:

Obs PERS1 PERS2 ADD1_1 ADD1_2 ADD2_1 ADD2_2

1 MIGS 5 HIGH ST
2 ROBBIE KYLIE ROSE COTTAGE ROSE COTTAGE SOMEWHERE SOMEWHERE

Tom · Posted 03-21-2012 07:14 PM

How do you distinquish between the multiple values for ADD on the same line? They appear to be separated by spaces, but also have spaces embedded in the values.

HEADER RECORD
PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST           
PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE           
END OF FILE

Could you assume that if there are two spaces in a row then that is the delimiter?

Do you have to look for the : or | and then search backwards to find the beginning of the label/variable name that preceeds the vertical bar?

When you say "dynamically" what does that mean? Is it generated by a program? Can you change the program so that the output is delimited more consistently?

robwork · Posted 03-21-2012 07:23 PM

I was given the following code that created the file and I have to write a code to read the file back in

data temp;
length PERS1 PERS2 $8. ADD1_1 ADD1_2 ADD2_1 ADD2_2 $20.;
PERS1="MIGS";PERS2="";ADD1_1="5 HIGH ST";ADD1_2="";ADD2_1="";ADD2_2="";output;
PERS1="ROBBIE";PERS2="KYLIE";ADD1_1="ROSE COTTAGE";ADD1_2="ROSE COTTAGE";ADD2_1="SOMEWHERE";ADD2_2="SOMEWHERE";output;
run;

data _null_;
file "C:\Users\Documents\test.txt";
set temp end=last;
if _n_=1 then put "HEADER RECORD";

if PERS1^="" then put"PERS:PERS1|" PERS1 $8. @;
if PERS2^="" then put"PERS:PERS2|" PERS2 $8. @;
if ADD1_1^="" then put "ADD:P1A1|" ADD1_1 $20. @;
if ADD1_2^="" then put "ADD:P1A2|" ADD1_2 $20. @;
if ADD2_1^="" then put "ADD:P2A1|" ADD2_1 $20. @;
if ADD2_2^="" then put "ADD:P2A2|" ADD2_2 $20. @;
put " " ;
if last then put "END OF FILE";
run;

proc print data=temp;
run;

Tom · Posted 03-21-2012 08:02 PM

So given that you know that person names take 8 and addresses take 20 characters it makes it easier. Try this. You might need to add code to rename the columns.

data _null_;
  file tmpfile1;
  input ;
  put _infile_;
cards;
HEADER RECORD
PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST
PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE
END OF FILE
run;
data temp ;
  length type $4 name $8 value $20;
  infile tmpfile1 truncover dlm=':|' ;
  input @;
  if _infile_ in : ( 'HEADER RECORD','END OF FILE') then return;
  row+1;
  do until (type=' ');
    input type name @ ;
    if type='PERS' then do; input value $8. @; output; end;
    else if type='ADD' then do; input value $20. @; output; end;
  end;
run;
proc transpose data=temp out=want (drop=_:);
  by row;
  id name;
  var value;
run;
data _null_;
  set want ;
  put (_all_) (=);
run;

row=1 PERS1=MIGS P1A1=5 HIGH ST PERS2=  P1A2=  P2A1=  P2A2=
row=2 PERS1=ROBBIE P1A1=ROSE COTTAGE PERS2=KYLIE P1A2=ROSE COTTAGE P2A1=SOMEWHERE P2A2=SOMEWHERE
NOTE: There were 2 observations read from the data set WORK.WANT.

Ksharp · Posted 03-22-2012 03:15 AM

That is really not simple.

filename x temp;
data _null_;
file x;
infile cards length=len;
input row $varying400. len;
length list $ 4000;
if left(row) not in : ( 'HEADER RECORD','END OF FILE') then do;
row=prxchange('s/\|/=/',-1,row);
row=prxchange('s/:[a-zA-Z]+|[a-zA-Z]+(?=\d+=)/_/',-1,row); 
put row; 
start=1;stop=length(row);
pid=prxparse('/\w+(?==)/');
call prxnext(pid,start,stop,row,position,length);
do while(position>0); 
 x=substr(row,position,length);
 if not index(list,strip(x)) then do;
                             list=catx(' ',list,x);
                             call symputx('list',list);
                           end;
 call prxnext(pid,start,stop,row,position,length);
end;
end;
cards; 
HEADER RECORD
PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST           
PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE           
END OF FILE
;
run;
data want;
 infile x ;
 input (&list) (= $40.);
run;

Ksharp

Message was edited by: xia keshan

robwork · Posted 03-28-2012 07:35 PM

Thanks for all the help I also found the following solution worked

data temp;

infile "" length=lg lrecl=1000 end=last;

input @;

if _n_=1 or last then delete;

start=1;

do while (start<lg-1);

input @start what $3. @;

if what="PER" then do;

input @start+5 type $5. @;

if type="PERS1" then input @start+11 PERS1 $8. @;

else if type="PERS2" then input @start+11 PERS2 $8. @;

start=start+19;

end;

else if what="ADD" then do;

input @start+4 type $4. @;

if type="P1A1" then input @start+9 ADD1_1 $20. @;

else if type="P1A2" then input @start+9 ADD1_2 $20. @;

else if type="P2A1" then input @start+9 ADD2_1 $20. @;

else if type="P2A2" then input @start+9 ADD2_2 $20. @;

start=start+29;

end;

run;

proc print data=temp;

run;

FriedEgg · Posted 03-28-2012 08:24 PM

A little pointer control and you can do it pretty simply:

data foo;
 infile cards missover;
 input @'PERS:PERS1|' pers1    $  8.
       @'ADD:P1A1|'   add1_1   $ 20.
       @'ADD:P1A2|'   add1_2   $ 20.
       @1
       @'PERS:PERS2|' pers2    $  8.
       @'ADD:P2A1|'   add2_1   $ 20.
       @'ADD:P2A2|'   add2_2   $ 20. ;
 cards;
PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST           
PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE           
;
run;