Good day
I want to extract percentage values from the first column, i have highlighted the second column how i will like to output the values.
PLASMA CELLS= 0.13% OF LEUCOCYTES. |
0.13% |
MALIGNANT PLASMA CELLS = 0.049% |
0.049% |
0.47% |
0.47% |
OF WHICH =91% HAVE A NEOPLASTIC PHENOTYPE |
91% |
Go check out the documentation on Perl Regular Expressions, then try this:
/* Set up test data */
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want ;
/* Create the regular expression
This looks for 1 or 2 numerics, followed by a decimal point,
followed by 0-3 numerics and a percent sign e.g.
10.001%, 1.1%
*/
if _n_=1 then do ;
retain regExpID ;
regExpID=prxparse('/([0-9]{1,2}\.{0,1}[0-9]{0,3}%)/') ;
end ;
set got ;
/* see if there's a match in the string*/
position=prxmatch(regExpID,string) ;
put position= string= ;
/* We have a match */
do while (position);
/* extract the value */
percent=prxposn(regExpID,1,string) ;
output ;
put percent= ;
next=length(percent)+position ;
string=substr(string,next) ;
/* see if there is another match */
position=prxmatch(regExpID,string) ;
put string= ;
end ;
run ;
Looks like a great job for a regular expression. Is there always only one percentage value in each string?
And what do you expect as result, if there is more than one value to keep?
Go check out the documentation on Perl Regular Expressions, then try this:
/* Set up test data */
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want ;
/* Create the regular expression
This looks for 1 or 2 numerics, followed by a decimal point,
followed by 0-3 numerics and a percent sign e.g.
10.001%, 1.1%
*/
if _n_=1 then do ;
retain regExpID ;
regExpID=prxparse('/([0-9]{1,2}\.{0,1}[0-9]{0,3}%)/') ;
end ;
set got ;
/* see if there's a match in the string*/
position=prxmatch(regExpID,string) ;
put position= string= ;
/* We have a match */
do while (position);
/* extract the value */
percent=prxposn(regExpID,1,string) ;
output ;
put percent= ;
next=length(percent)+position ;
string=substr(string,next) ;
/* see if there is another match */
position=prxmatch(regExpID,string) ;
put string= ;
end ;
run ;
data got ; infile cards ; input string $60. ; cards; 1234567890123456789012345678901234567890 PLASMA CELLS= 0.13% OF LEUCOCYTES. MALIGNANT PLASMA CELLS = 0.049% 0.47% OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE ; run ; data want; set got; pid=prxparse('/[\d\.]+%/'); s=1;e=length(string); call prxnext(pid,s,e,string,p,l); do while(p>0); want=substr(string,p,l);output; call prxnext(pid,s,e,string,p,l); end; drop pid s e p l; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.