I have been given a file that was created dynamically and I am unsure how to read it in. the file looks like:
HEADER RECORD
PERS:PERS1|MIGS ADD:P1A1|5 HIGH ST
PERS:PERS1|ROBBIE PERS:PERS2|KYLIE ADD:P1A1|ROSE COTTAGE ADD:P1A2|ROSE COTTAGE ADD:P2A1|SOMEWHERE ADD:P2A2|SOMEWHERE
END OF FILE
Any help would be gretly appreciated.
Thanks
Given your sample records, what would you want as a result?
Ideally it would look like:
Obs PERS1 PERS2 ADD1_1 ADD1_2 ADD2_1 ADD2_2
1 MIGS 5 HIGH ST
2 ROBBIE KYLIE ROSE COTTAGE ROSE COTTAGE SOMEWHERE SOMEWHERE
How do you distinquish between the multiple values for ADD on the same line? They appear to be separated by spaces, but also have spaces embedded in the values.
HEADER RECORD
PERS:PERS1|MIGS ADD:P1A1|5 HIGH ST
PERS:PERS1|ROBBIE PERS:PERS2|KYLIE ADD:P1A1|ROSE COTTAGE ADD:P1A2|ROSE COTTAGE ADD:P2A1|SOMEWHERE ADD:P2A2|SOMEWHERE
END OF FILE
Could you assume that if there are two spaces in a row then that is the delimiter?
Do you have to look for the : or | and then search backwards to find the beginning of the label/variable name that preceeds the vertical bar?
When you say "dynamically" what does that mean? Is it generated by a program? Can you change the program so that the output is delimited more consistently?
I was given the following code that created the file and I have to write a code to read the file back in
data temp;
length PERS1 PERS2 $8. ADD1_1 ADD1_2 ADD2_1 ADD2_2 $20.;
PERS1="MIGS";PERS2="";ADD1_1="5 HIGH ST";ADD1_2="";ADD2_1="";ADD2_2="";output;
PERS1="ROBBIE";PERS2="KYLIE";ADD1_1="ROSE COTTAGE";ADD1_2="ROSE COTTAGE";ADD2_1="SOMEWHERE";ADD2_2="SOMEWHERE";output;
run;
data _null_;
file "C:\Users\Documents\test.txt";
set temp end=last;
if _n_=1 then put "HEADER RECORD";
if PERS1^="" then put"PERS:PERS1|" PERS1 $8. @;
if PERS2^="" then put"PERS:PERS2|" PERS2 $8. @;
if ADD1_1^="" then put "ADD:P1A1|" ADD1_1 $20. @;
if ADD1_2^="" then put "ADD:P1A2|" ADD1_2 $20. @;
if ADD2_1^="" then put "ADD:P2A1|" ADD2_1 $20. @;
if ADD2_2^="" then put "ADD:P2A2|" ADD2_2 $20. @;
put " " ;
if last then put "END OF FILE";
run;
proc print data=temp;
run;
So given that you know that person names take 8 and addresses take 20 characters it makes it easier. Try this. You might need to add code to rename the columns.
data _null_;
file tmpfile1;
input ;
put _infile_;
cards;
HEADER RECORD
PERS:PERS1|MIGS ADD:P1A1|5 HIGH ST
PERS:PERS1|ROBBIE PERS:PERS2|KYLIE ADD:P1A1|ROSE COTTAGE ADD:P1A2|ROSE COTTAGE ADD:P2A1|SOMEWHERE ADD:P2A2|SOMEWHERE
END OF FILE
run;
data temp ;
length type $4 name $8 value $20;
infile tmpfile1 truncover dlm=':|' ;
input @;
if _infile_ in : ( 'HEADER RECORD','END OF FILE') then return;
row+1;
do until (type=' ');
input type name @ ;
if type='PERS' then do; input value $8. @; output; end;
else if type='ADD' then do; input value $20. @; output; end;
end;
run;
proc transpose data=temp out=want (drop=_:);
by row;
id name;
var value;
run;
data _null_;
set want ;
put (_all_) (=);
run;
row=1 PERS1=MIGS P1A1=5 HIGH ST PERS2= P1A2= P2A1= P2A2=
row=2 PERS1=ROBBIE P1A1=ROSE COTTAGE PERS2=KYLIE P1A2=ROSE COTTAGE P2A1=SOMEWHERE P2A2=SOMEWHERE
NOTE: There were 2 observations read from the data set WORK.WANT.
That is really not simple.
filename x temp; data _null_; file x; infile cards length=len; input row $varying400. len; length list $ 4000; if left(row) not in : ( 'HEADER RECORD','END OF FILE') then do; row=prxchange('s/\|/=/',-1,row); row=prxchange('s/:[a-zA-Z]+|[a-zA-Z]+(?=\d+=)/_/',-1,row); put row; start=1;stop=length(row); pid=prxparse('/\w+(?==)/'); call prxnext(pid,start,stop,row,position,length); do while(position>0); x=substr(row,position,length); if not index(list,strip(x)) then do; list=catx(' ',list,x); call symputx('list',list); end; call prxnext(pid,start,stop,row,position,length); end; end; cards; HEADER RECORD PERS:PERS1|MIGS ADD:P1A1|5 HIGH ST PERS:PERS1|ROBBIE PERS:PERS2|KYLIE ADD:P1A1|ROSE COTTAGE ADD:P1A2|ROSE COTTAGE ADD:P2A1|SOMEWHERE ADD:P2A2|SOMEWHERE END OF FILE ; run; data want; infile x ; input (&list) (= $40.); run;
Ksharp
Message was edited by: xia keshan
Thanks for all the help I also found the following solution worked
data temp;
infile "" length=lg lrecl=1000 end=last;
input @;
if _n_=1 or last then delete;
start=1;
do while (start<lg-1);
input @start what $3. @;
if what="PER" then do;
input @start+5 type $5. @;
if type="PERS1" then input @start+11 PERS1 $8. @;
else if type="PERS2" then input @start+11 PERS2 $8. @;
start=start+19;
end;
else if what="ADD" then do;
input @start+4 type $4. @;
if type="P1A1" then input @start+9 ADD1_1 $20. @;
else if type="P1A2" then input @start+9 ADD1_2 $20. @;
else if type="P2A1" then input @start+9 ADD2_1 $20. @;
else if type="P2A2" then input @start+9 ADD2_2 $20. @;
start=start+29;
end;
end;
run;
proc print data=temp;
run;
A little pointer control and you can do it pretty simply:
data foo;
infile cards missover;
input @'PERS:PERS1|' pers1 $ 8.
@'ADD:P1A1|' add1_1 $ 20.
@'ADD:P1A2|' add1_2 $ 20.
@1
@'PERS:PERS2|' pers2 $ 8.
@'ADD:P2A1|' add2_1 $ 20.
@'ADD:P2A2|' add2_2 $ 20. ;
cards;
PERS:PERS1|MIGS ADD:P1A1|5 HIGH ST
PERS:PERS1|ROBBIE PERS:PERS2|KYLIE ADD:P1A1|ROSE COTTAGE ADD:P1A2|ROSE COTTAGE ADD:P2A1|SOMEWHERE ADD:P2A2|SOMEWHERE
;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.