Good day
I want to extract percentage values from the first column, i have highlighted the second column how i will like to output the values.
PLASMA CELLS= 0.13% OF LEUCOCYTES. |
0.13% |
MALIGNANT PLASMA CELLS = 0.049% |
0.049% |
0.47% |
0.47% |
OF WHICH =91% HAVE A NEOPLASTIC PHENOTYPE |
91% |
Go check out the documentation on Perl Regular Expressions, then try this:
/* Set up test data */
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want ;
/* Create the regular expression
This looks for 1 or 2 numerics, followed by a decimal point,
followed by 0-3 numerics and a percent sign e.g.
10.001%, 1.1%
*/
if _n_=1 then do ;
retain regExpID ;
regExpID=prxparse('/([0-9]{1,2}\.{0,1}[0-9]{0,3}%)/') ;
end ;
set got ;
/* see if there's a match in the string*/
position=prxmatch(regExpID,string) ;
put position= string= ;
/* We have a match */
do while (position);
/* extract the value */
percent=prxposn(regExpID,1,string) ;
output ;
put percent= ;
next=length(percent)+position ;
string=substr(string,next) ;
/* see if there is another match */
position=prxmatch(regExpID,string) ;
put string= ;
end ;
run ;
Looks like a great job for a regular expression. Is there always only one percentage value in each string?
And what do you expect as result, if there is more than one value to keep?
Go check out the documentation on Perl Regular Expressions, then try this:
/* Set up test data */
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want ;
/* Create the regular expression
This looks for 1 or 2 numerics, followed by a decimal point,
followed by 0-3 numerics and a percent sign e.g.
10.001%, 1.1%
*/
if _n_=1 then do ;
retain regExpID ;
regExpID=prxparse('/([0-9]{1,2}\.{0,1}[0-9]{0,3}%)/') ;
end ;
set got ;
/* see if there's a match in the string*/
position=prxmatch(regExpID,string) ;
put position= string= ;
/* We have a match */
do while (position);
/* extract the value */
percent=prxposn(regExpID,1,string) ;
output ;
put percent= ;
next=length(percent)+position ;
string=substr(string,next) ;
/* see if there is another match */
position=prxmatch(regExpID,string) ;
put string= ;
end ;
run ;
data got ; infile cards ; input string $60. ; cards; 1234567890123456789012345678901234567890 PLASMA CELLS= 0.13% OF LEUCOCYTES. MALIGNANT PLASMA CELLS = 0.049% 0.47% OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE ; run ; data want; set got; pid=prxparse('/[\d\.]+%/'); s=1;e=length(string); call prxnext(pid,s,e,string,p,l); do while(p>0); want=substr(string,p,l);output; call prxnext(pid,s,e,string,p,l); end; drop pid s e p l; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.