Text mining and content categorization

how to add text

Reply
Occasional Contributor
Posts: 16

how to add text


hi, any ideas how to modify a data set of pure text?

I have a text file which need to be modified, it was imported all in one single column. The file has different section, each section end with end of xxx section data.  An example of the file look like below

a,b,c,d

end of ppp section data

1,1,a,a,c,c

2,2,w,w,f,g

end of qqq section data

q,w,1,3,4,a,f,1

w,p,1,4,5,j,n,2

end of www section data

I need to add some data when I see end of. for example, when there is "end of ppp section data", I want to add k,l,m,n. When there is "end of qqq section data", I want to add 9,9,q,s,d,v.

Any ideas?

Thank you.

Respected Advisor
Posts: 4,606

Re: how to add text

The most robust and flexible approach is probably PRX string matching :

data test;
length line $100;
input;
line = _infile_;
datalines;
a,b,c,d
end of ppp section data
1,1,a,a,c,c
2,2,w,w,f,g
end of qqq section data
q,w,1,3,4,a,f,1
w,p,1,4,5,j,n,2
end of www section data
;

data want(drop=tmpLine what add PRXid);
retain PRXid;
length tmpLine $100 what $32 add $64;
if _n_=1 then PRXid = prxparse("/^\s*end\s+of\s+(\w+)\s+section\s+data/i");
set test;
if prxmatch(PRXid, line) then do;
     what = prxposn(PRXid, 1, line);
     select (upcase(what));         

          when ("PPP") add = "k,l,m,n" ;
          when ("QQQ") add = "9,9,q,s,d,v";
          otherwise add = "UNKNOWN SECTION";
          end;
     tmpLine = line;
     line = add;
     output;
     line = tmpLine;
     end;
output;
run;

PG

PG
Contributor
Posts: 36

Re: how to add text

Does PRX refer to Perl Regular Expressions?

If so, more information can be found at: SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

If not, PGStats, can you tell us what PRX means?

Respected Advisor
Posts: 4,606

Re: how to add text

Yes, PRX is the prefix for SAS functions implementing Perl Regular Expressions. As displayed in my code, prxparse compiles a regular expression, prxmatch finds matches and prxposn extracts a substring from a match. - PG

PG
Occasional Contributor
Posts: 16

Re: how to add text

This works for me. Thank you.

Ask a Question
Discussion stats
  • 4 replies
  • 492 views
  • 0 likes
  • 3 in conversation