BookmarkSubscribeRSS Feed
robwork
Calcite | Level 5

I have been given a file that was created dynamically and I am unsure how  to read it in. the file looks like:

HEADER RECORD

PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST           

PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE           

END OF FILE

Any help would be gretly appreciated.

Thanks

8 REPLIES 8
art297
Opal | Level 21

Given your sample records, what would you want as a result?

robwork
Calcite | Level 5

Ideally it would look like:

        Obs    PERS1     PERS2       ADD1_1                ADD1_2                  ADD2_1       ADD2_2 

         1     MIGS                          5 HIGH ST                                            
         2     ROBBIE    KYLIE    ROSE COTTAGE    ROSE COTTAGE    SOMEWHERE    SOMEWHERE

Tom
Super User Tom
Super User

How do you distinquish between the multiple values for ADD on the same line?  They appear to be separated by spaces, but also have spaces embedded in the values.

HEADER RECORD

PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST          

PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE          

END OF FILE

Could you assume that if there are two spaces in a row then that is the delimiter?

Do you have to look for the : or | and then search backwards to find the beginning of the label/variable name that preceeds the vertical bar?

When you say "dynamically" what does that mean?  Is it generated by a program?  Can you change the program so that the output is delimited more consistently?

robwork
Calcite | Level 5

I was given the following code that created the file and I have to write a code to read the file back in

data temp;
length PERS1 PERS2 $8. ADD1_1 ADD1_2 ADD2_1 ADD2_2 $20.;
PERS1="MIGS";PERS2="";ADD1_1="5 HIGH ST";ADD1_2="";ADD2_1="";ADD2_2="";output;
PERS1="ROBBIE";PERS2="KYLIE";ADD1_1="ROSE COTTAGE";ADD1_2="ROSE COTTAGE";ADD2_1="SOMEWHERE";ADD2_2="SOMEWHERE";output;
run;

data _null_;
file "C:\Users\Documents\test.txt";
set temp end=last;
if _n_=1 then put "HEADER RECORD";

if PERS1^="" then put"PERS:PERS1|" PERS1 $8. @;
if PERS2^="" then put"PERS:PERS2|" PERS2 $8. @;
if ADD1_1^="" then put "ADD:P1A1|" ADD1_1 $20. @;
if ADD1_2^="" then put "ADD:P1A2|" ADD1_2 $20. @;
if ADD2_1^="" then put "ADD:P2A1|" ADD2_1 $20. @;
if ADD2_2^="" then put "ADD:P2A2|" ADD2_2 $20. @;
put " " ;
if last then put "END OF FILE";
run;


proc print data=temp;
run;

Tom
Super User Tom
Super User

So given that you know that person names take 8 and addresses take 20 characters it makes it easier. Try this.  You might need to add code to rename the columns.

data _null_;

  file tmpfile1;

  input ;

  put _infile_;

cards;

HEADER RECORD

PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST

PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE

END OF FILE

run;

data temp ;

  length type $4 name $8 value $20;

  infile tmpfile1 truncover dlm=':|' ;

  input @;

  if _infile_ in : ( 'HEADER RECORD','END OF FILE') then return;

  row+1;

  do until (type=' ');

    input type name @ ;

    if type='PERS' then do; input value $8. @; output; end;

    else if type='ADD' then do; input value $20. @; output; end;

  end;

run;

proc transpose data=temp out=want (drop=_:);

  by row;

  id name;

  var value;

run;

data _null_;

  set want ;

  put (_all_) (=);

run;

row=1 PERS1=MIGS P1A1=5 HIGH ST PERS2=  P1A2=  P2A1=  P2A2=

row=2 PERS1=ROBBIE P1A1=ROSE COTTAGE PERS2=KYLIE P1A2=ROSE COTTAGE P2A1=SOMEWHERE P2A2=SOMEWHERE

NOTE: There were 2 observations read from the data set WORK.WANT.

Ksharp
Super User

That is really not simple.

filename x temp;
data _null_;
file x;
infile cards length=len;
input row $varying400. len;
length list $ 4000;
if left(row) not in : ( 'HEADER RECORD','END OF FILE') then do;
row=prxchange('s/\|/=/',-1,row);
row=prxchange('s/:[a-zA-Z]+|[a-zA-Z]+(?=\d+=)/_/',-1,row); 
put row; 
start=1;stop=length(row);
pid=prxparse('/\w+(?==)/');
call prxnext(pid,start,stop,row,position,length);
do while(position>0); 
 x=substr(row,position,length);
 if not index(list,strip(x)) then do;
                             list=catx(' ',list,x);
                             call symputx('list',list);
                           end;
 call prxnext(pid,start,stop,row,position,length);
end;
end;
cards; 
HEADER RECORD
PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST           
PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE           
END OF FILE
;
run;
data want;
 infile x ;
 input (&list) (= $40.);
run;



Ksharp

Message was edited by: xia keshan

robwork
Calcite | Level 5

Thanks for all the help I also found the following solution worked

data temp;

infile "" length=lg lrecl=1000 end=last;

input @;

 

          if _n_=1 or last then delete;

          start=1;

                    do while (start<lg-1);

                                input @start what $3. @;

                              

                               if what="PER" then do;

                               input @start+5 type $5. @;

                        

                         if type="PERS1" then input @start+11 PERS1 $8. @;

                        

                         else if type="PERS2" then input @start+11 PERS2 $8. @;

             

              start=start+19;

           end;

          

           else if what="ADD" then do;

              input @start+4 type $4. @;

                 

                  if type="P1A1" then input @start+9 ADD1_1 $20. @;

                 

                  else if type="P1A2" then input @start+9 ADD1_2 $20. @;

                 

                  else if type="P2A1" then input @start+9 ADD2_1 $20. @;

                 

                  else if type="P2A2" then input @start+9 ADD2_2 $20. @;

             

              start=start+29;

          

           end;

          end;

run;

proc print data=temp;

run;

FriedEgg
SAS Employee

A little pointer control and you can do it pretty simply:

data foo;

infile cards missover;

input @'PERS:PERS1|' pers1    $  8.

       @'ADD:P1A1|'   add1_1   $ 20.

       @'ADD:P1A2|'   add1_2   $ 20.

       @1

       @'PERS:PERS2|' pers2    $  8.

       @'ADD:P2A1|'   add2_1   $ 20.

       @'ADD:P2A2|'   add2_2   $ 20. ;

cards;

PERS:PERS1|MIGS    ADD:P1A1|5 HIGH ST          

PERS:PERS1|ROBBIE  PERS:PERS2|KYLIE   ADD:P1A1|ROSE COTTAGE        ADD:P1A2|ROSE COTTAGE        ADD:P2A1|SOMEWHERE           ADD:P2A2|SOMEWHERE          

;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 923 views
  • 7 likes
  • 5 in conversation