BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ACLAN
Obsidian | Level 7

Hi there, I'm trying to create a new column with a given pattern found in another column. 

 

My data looks like this: 

 

data have;
  infile datalines dlm="¬";
  length SERIAID $20;
  input SERIAID $;
datalines;
L1CU 
L1UL1C 
L2C 
L2CL3U 
L2CU 
L3UC 
L3CZ 
L4C
L4CL5U 
L6U
;
run;

I want to add a new column called "Level" where I add the existing pattern "L", "X" ( a number") and a "C". If this pattern does not exist (for example in the last row), that row should be removed from the dataset. 

 

data have2;
set have;
level = scan(SERIALID, 'C');
RUN;

 But I get a column with only blanks. 

 

Does anyone knows how to do this in a fast way? I'm new to SAS so I'm not sure which SAS functions is better for stringing.

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Then do:

 

data want;
if not id then id + prxParse("/L\d+C/");
set have;
call prxSubstr(id, SERIAID, pos, len);
if pos then newColumn = substr(SERIAID, pos, len);
else delete;
drop id pos len;
run;

PGStats_0-1615832847131.png

 

PG

View solution in original post

5 REPLIES 5
ballardw
Super User

SCAN looks for a delimited "word" by specified count position:  Scan('this is a list of words', 4) returns the fourth word: "list".

So I am pretty sure your log would show some variation of ;

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      29:13
NOTE: Invalid numeric data, 'c' , at line 29 column 13.
NOTE: Argument 2 to function SCAN('abc',.) at line 29 column 6 is invalid.

which tells pretty clearly that SCAN is not what you want.

 

 

It is not clear exactly what you are searching for since L and C occur multiple times in multiple positions in some of those values.

You should provide examples of what you expect the result for LEVEL should be for your example data.

 

ACLAN
Obsidian | Level 7

Thanks @ballardw I actually forgot to give an example for the output. I want the new column named "level" to show the pattern "LXC", where X is a number, present in SERIAID: 

L1C
L1C
L2C
L2C
L2C
NA (not existing)
L3C
L4C
L4C
NA (not existing)

 

The rows where the pattern "LXC" does not exist (NA (not existing)) should also be removed from the dataset.  

PGStats
Opal | Level 21

Pattern matching is best done with PERL regular expressions:

 

data want;
set have;
if prxMatch("/L\d+C/", SERIAID);
run;

PGStats_0-1615825674613.png

 

PG
ACLAN
Obsidian | Level 7
Thanks. This code is good to identify which rows have "/L\d+C/" but how can I sub string this pattern into a new column? My goal was to get a new column with L1C, L1C, L2C, L2C, L2C, L3C...
PGStats
Opal | Level 21

Then do:

 

data want;
if not id then id + prxParse("/L\d+C/");
set have;
call prxSubstr(id, SERIAID, pos, len);
if pos then newColumn = substr(SERIAID, pos, len);
else delete;
drop id pos len;
run;

PGStats_0-1615832847131.png

 

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 836 views
  • 3 likes
  • 3 in conversation