BookmarkSubscribeRSS Feed
SAS49
Obsidian | Level 7

Hi all,

 

Very new to SAS.  I have a data set with a variable/column called sample.  I have read in this dataset, but I would like to keep only specific rows that match a sample name pattern.  All of the ones that I would like to keep start with (A-H) and are then followed by a - and 3 numbers 

for example:

A-095

How do I keep only rows with samples in this format and drop other rows for example:

IQP17-UM-1997 

 

Thanks!

 

 

3 REPLIES 3
Astounding
PROC Star

I'm sure people who know parsing functions can give you a shorter expression to do the trick.  In the meantime, this should work:

 

data want;
   set have;
   if ("A" <=: sample <= "H") and substr(sample, 2, 1) = "-" and
   length (sample) = 5 and input(substr(sample, 3), ?? 3.) > .;
run;
Ksharp
Super User
data have;
input have $20.;
if prxmatch('/^[A-H]\-\d\d\d$/i',strip(have)) then matched=1;
cards;
A-095
IQP17-UM-1997 
;
FreelanceReinh
Jade | Level 19

Hi @SAS49 and welcome to the SAS Support Communities!

 

With PRXMATCH, one of the parsing functions mentioned by Astounding -- and already suggested by Ksharp, as I've just seen -- you can specify a Perl regular expression to describe the search pattern.

 

Example:

/^[A-H]-\d{3}$/

Explanation (see Tables of Perl Regular Expression (PRX) Metacharacters for the documentation):

  • ^ means "beginning of the string"
  • [A-H] means one of the characters in the range "A", "B", ..., "H"
  • - is the hyphen
  • \d means "digit," i.e. one of the characters in the range "0", "1", ..., "9"
  • \d{3} is a shorthand for \d\d\d, i.e., three digits in a row
  • $ means "end of the string"

This would be a suitable pattern for strings of length 5. However, your character variable must be longer to accommodate a string like "IQP17-UM-1997," which means that shorter values would be padded with trailing blanks. So we need to insert code for "zero or more blanks" (after \d{3}) into the regular expression or, simpler, trim the trailing blanks from the string (e.g., using the TRIM function) before starting the search, as shown below.

/* Create sample data for demonstration */

data have;
input sample $char13.;
cards;
a-095
A-095
A-0951
 A-044
B-40
H-123
IQP17-UM-1997
C-1.3
D-1E3
E-.99
F--28
G-+10
CF-123
;

/* Select observations matching the desired pattern */

data want;
set have;
if prxmatch('/^[A-H]-\d{3}$/', trim(sample));
run;

The PRXMATCH function returns the character position where the search pattern is found in the string -- necessarily 1 in your example -- and 0 if it is not found. Hence, the returned value is a suitable condition in a subsetting IF statement. The above code results in two observations in dataset WANT, as only "A-095" and "H-123" match the Perl regular expression.

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 852 views
  • 2 likes
  • 4 in conversation