Hello SAS experts,
How to extract numbers before a certain type of strings? I am working on a project to extract the numbers in front of the word UNITS.
My variable, unit, has all types of variations. Here is the example,
| Unit |
| 10 UNITS |
| 4 UNITS FOR ACCU=259,VERIFY BY ATHENA R |
| F.S 199 30 UNITS PER ORDERS |
| 0 UNITS/FSBS 144 |
| 0.25ML/125UNITS |
| 0.35ML/175 UNITS |
| 1.5 UNITS DOUBLE CHECKED |
| 10 U8NITS |
| 10UNITS-BS=306 |
10 U8NITS
10UNITS-BS=306
What I want to get (variable named num_wanted) is
| Unit | num_wanted |
| 10 UNITS | 10 |
| 4 UNITS FOR ACCU=259,VERIFY BY ATHENA R | 4 |
| F.S 199 30 UNITS PER ORDERS | 30 |
| 0 UNITS/FSBS 144 | 0 |
| 0.25ML/125UNITS | 125 |
| 0.35ML/175 UNITS | 175 |
| 1.5 UNITS DOUBLE CHECKED | 1.5 |
| 10 U8NITS | 10 |
| 10UNITS-BS=306 | 10 |
I have tried multiple ways but I couldn't figure how to extract the number before a certain type of string (UNITS have lots of variations such as U, UNIT, UN, UMIT, etc).
Would you help me out? THank you.
I would use:
data want;
if not prxId then prxId + prxParse("/(\d*(\.\d*)?)\s?U/");
set have;
if prxMatch(prxId, Unit) then
num = input(prxPosn(prxId, 1, Unit), best.);
format num best.;
drop prxId;
run;
Note :
The pattern (UNIT|UN|U) will match anything that starts with U, so you might as well match only U. However, listing all alternatives explicitly would be useful if you needed to extract the unit word in question with something like
unitWord = prxPosn(prxId, 2, Unit);
There are several ways to do this. I think this accommodates your needs.
data have;
input Unit:$100.;
infile datalines dlm='|';
datalines;
10 UNITS
4 UNITS FOR ACCU=259,VERIFY BY ATHENA R
F.S 199 30 UNITS PER ORDERS
0 UNITS/FSBS 144
0.25ML/125UNITS
0.35ML/175 UNITS
1.5 UNITS DOUBLE CHECKED
10 U8NITS
10UNITS-BS=306
10 U8NITS
10UNITS-BS=306
;
data want(keep=Unit num);
set have;
RegExID=prxparse('/(\d\.*\d*)\s*(UNITS|U8NITS|UNIT|UMIT|UN|U)/i');
position=prxmatch(RegExID, Unit);
num=input(prxposn(RegExID, 1, Unit), best8.);
run;
This code gives you the below. Let me know if it works on your real data 🙂
Thank you for your solution. It was wonderful. I never knew how to use the Perl Syntax before and thanks for showing me.
I would use:
data want;
if not prxId then prxId + prxParse("/(\d*(\.\d*)?)\s?U/");
set have;
if prxMatch(prxId, Unit) then
num = input(prxPosn(prxId, 1, Unit), best.);
format num best.;
drop prxId;
run;
Note :
The pattern (UNIT|UN|U) will match anything that starts with U, so you might as well match only U. However, listing all alternatives explicitly would be useful if you needed to extract the unit word in question with something like
unitWord = prxPosn(prxId, 2, Unit);
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.