Good day
I want to extract percentage values from the first column, i have highlighted the second column how i will like to output the values.
|
PLASMA CELLS= 0.13% OF LEUCOCYTES. |
0.13% |
|
MALIGNANT PLASMA CELLS = 0.049% |
0.049% |
|
0.47% |
0.47% |
|
OF WHICH =91% HAVE A NEOPLASTIC PHENOTYPE |
91% |
Go check out the documentation on Perl Regular Expressions, then try this:
/* Set up test data */
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want ;
/* Create the regular expression
This looks for 1 or 2 numerics, followed by a decimal point,
followed by 0-3 numerics and a percent sign e.g.
10.001%, 1.1%
*/
if _n_=1 then do ;
retain regExpID ;
regExpID=prxparse('/([0-9]{1,2}\.{0,1}[0-9]{0,3}%)/') ;
end ;
set got ;
/* see if there's a match in the string*/
position=prxmatch(regExpID,string) ;
put position= string= ;
/* We have a match */
do while (position);
/* extract the value */
percent=prxposn(regExpID,1,string) ;
output ;
put percent= ;
next=length(percent)+position ;
string=substr(string,next) ;
/* see if there is another match */
position=prxmatch(regExpID,string) ;
put string= ;
end ;
run ;
Looks like a great job for a regular expression. Is there always only one percentage value in each string?
And what do you expect as result, if there is more than one value to keep?
Go check out the documentation on Perl Regular Expressions, then try this:
/* Set up test data */
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want ;
/* Create the regular expression
This looks for 1 or 2 numerics, followed by a decimal point,
followed by 0-3 numerics and a percent sign e.g.
10.001%, 1.1%
*/
if _n_=1 then do ;
retain regExpID ;
regExpID=prxparse('/([0-9]{1,2}\.{0,1}[0-9]{0,3}%)/') ;
end ;
set got ;
/* see if there's a match in the string*/
position=prxmatch(regExpID,string) ;
put position= string= ;
/* We have a match */
do while (position);
/* extract the value */
percent=prxposn(regExpID,1,string) ;
output ;
put percent= ;
next=length(percent)+position ;
string=substr(string,next) ;
/* see if there is another match */
position=prxmatch(regExpID,string) ;
put string= ;
end ;
run ;
data got ;
infile cards ;
input string $60. ;
cards;
1234567890123456789012345678901234567890
PLASMA CELLS= 0.13% OF LEUCOCYTES.
MALIGNANT PLASMA CELLS = 0.049%
0.47%
OF WHICH =91% HAVE A NEOPLASTIC 14.2% PHENOTYPE
;
run ;
data want;
set got;
pid=prxparse('/[\d\.]+%/');
s=1;e=length(string);
call prxnext(pid,s,e,string,p,l);
do while(p>0);
want=substr(string,p,l);output;
call prxnext(pid,s,e,string,p,l);
end;
drop pid s e p l;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.