Hi all,
Very new to SAS. I have a data set with a variable/column called sample. I have read in this dataset, but I would like to keep only specific rows that match a sample name pattern. All of the ones that I would like to keep start with (A-H) and are then followed by a - and 3 numbers
for example:
A-095
How do I keep only rows with samples in this format and drop other rows for example:
IQP17-UM-1997
Thanks!
I'm sure people who know parsing functions can give you a shorter expression to do the trick. In the meantime, this should work:
data want;
set have;
if ("A" <=: sample <= "H") and substr(sample, 2, 1) = "-" and
length (sample) = 5 and input(substr(sample, 3), ?? 3.) > .;
run;
data have;
input have $20.;
if prxmatch('/^[A-H]\-\d\d\d$/i',strip(have)) then matched=1;
cards;
A-095
IQP17-UM-1997
;
Hi @SAS49 and welcome to the SAS Support Communities!
With PRXMATCH, one of the parsing functions mentioned by Astounding -- and already suggested by Ksharp, as I've just seen -- you can specify a Perl regular expression to describe the search pattern.
Example:
/^[A-H]-\d{3}$/
Explanation (see Tables of Perl Regular Expression (PRX) Metacharacters for the documentation):
This would be a suitable pattern for strings of length 5. However, your character variable must be longer to accommodate a string like "IQP17-UM-1997," which means that shorter values would be padded with trailing blanks. So we need to insert code for "zero or more blanks" (after \d{3}) into the regular expression or, simpler, trim the trailing blanks from the string (e.g., using the TRIM function) before starting the search, as shown below.
/* Create sample data for demonstration */
data have;
input sample $char13.;
cards;
a-095
A-095
A-0951
A-044
B-40
H-123
IQP17-UM-1997
C-1.3
D-1E3
E-.99
F--28
G-+10
CF-123
;
/* Select observations matching the desired pattern */
data want;
set have;
if prxmatch('/^[A-H]-\d{3}$/', trim(sample));
run;
The PRXMATCH function returns the character position where the search pattern is found in the string -- necessarily 1 in your example -- and 0 if it is not found. Hence, the returned value is a suitable condition in a subsetting IF statement. The above code results in two observations in dataset WANT, as only "A-095" and "H-123" match the Perl regular expression.
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.