BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BochengJing
Fluorite | Level 6

Hello SAS experts, 

 

How to extract numbers before a certain type of strings? I am working on a project to extract the numbers in front of the word UNITS. 

My variable, unit, has all types of variations. Here is the example, 

Unit                                                                                                             
10 UNITS
4 UNITS FOR ACCU=259,VERIFY BY ATHENA R
F.S 199 30 UNITS PER ORDERS
0 UNITS/FSBS 144
0.25ML/125UNITS
0.35ML/175 UNITS
1.5 UNITS DOUBLE CHECKED
10 U8NITS
10UNITS-BS=306

10 U8NITS

10UNITS-BS=306

 

What I want to get (variable named num_wanted) is 

Unit                                                                                                             num_wanted
10 UNITS10
4 UNITS FOR ACCU=259,VERIFY BY ATHENA R4
F.S 199 30 UNITS PER ORDERS30
0 UNITS/FSBS 1440
0.25ML/125UNITS125
0.35ML/175 UNITS175
1.5 UNITS DOUBLE CHECKED1.5
10 U8NITS10
10UNITS-BS=30610

 

I have tried multiple ways but I couldn't figure how to extract the number before a certain type of string (UNITS have lots of variations such as U, UNIT, UN, UMIT, etc). 

Would you help me out? THank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I would use:

 

data want;
if not prxId then prxId + prxParse("/(\d*(\.\d*)?)\s?U/");
set have;
if prxMatch(prxId, Unit) then
    num = input(prxPosn(prxId, 1, Unit), best.);
format num best.;
drop prxId;
run;

Note :

The pattern (UNIT|UN|U) will match anything that starts with U, so you might as well match only U. However, listing all alternatives explicitly would be useful if you needed to extract the unit word in question with something like

 

unitWord = prxPosn(prxId, 2, Unit);

PG

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

There are several ways to do this. I think this accommodates your needs. 

 

data have;
input Unit:$100.;
infile datalines dlm='|';
datalines; 
10 UNITS
4 UNITS FOR ACCU=259,VERIFY BY ATHENA R
F.S 199 30 UNITS PER ORDERS
0 UNITS/FSBS 144
0.25ML/125UNITS
0.35ML/175 UNITS
1.5 UNITS DOUBLE CHECKED
10 U8NITS
10UNITS-BS=306
10 U8NITS
10UNITS-BS=306
;

data want(keep=Unit num);
   set have;
   RegExID=prxparse('/(\d\.*\d*)\s*(UNITS|U8NITS|UNIT|UMIT|UN|U)/i');
   position=prxmatch(RegExID, Unit);
   num=input(prxposn(RegExID, 1, Unit), best8.);
run;

This code gives you the below. Let me know if it works on your real data 🙂

 

 

Capture.PNG

BochengJing
Fluorite | Level 6

Thank you for your solution. It was wonderful. I never knew how to use the Perl Syntax before and thanks for showing me. 

PGStats
Opal | Level 21

I would use:

 

data want;
if not prxId then prxId + prxParse("/(\d*(\.\d*)?)\s?U/");
set have;
if prxMatch(prxId, Unit) then
    num = input(prxPosn(prxId, 1, Unit), best.);
format num best.;
drop prxId;
run;

Note :

The pattern (UNIT|UN|U) will match anything that starts with U, so you might as well match only U. However, listing all alternatives explicitly would be useful if you needed to extract the unit word in question with something like

 

unitWord = prxPosn(prxId, 2, Unit);

PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1437 views
  • 2 likes
  • 3 in conversation