BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smilbuta
Fluorite | Level 6

Hello Community,

I am working on data cleanup project and i have come across a item i am not sure how to tackle.

I have a data set with 5 columns, I need to retain 4 of them as is, but one of them I need to extract the largest string from.

In most cases this string is a 16 digit record number. 

 

The field will look like this, (the value i want is 'X')

 

Column Cell [AAAAAA XXXXXXXXXXXXXXXX BBBBBB] 

Or [XXXXXXXXXXXXXXXX AAAAAA BBBBBB]

Or [AAAAAA BBBBBB XXXXXXXXXXXXXXXX]

The 16 digit number can be in any of the three positions... (dont ask, legacy bookeeping)

 

And some times the same field will be dirty data with some random value entered as a place holder.

 

What is the best way to extract that 16 digit X  field from my data column and place it in a new column?

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star
data want;
  string = 'AAAAAA XXXXXXXXXXXXXXXX BBBBBB';
  do i = 1 to countw(string);
    if length(scan(string, i)) = 16 then wanted_word = scan(string, i);
  end;
  put _all_;
run;

View solution in original post

8 REPLIES 8
SASKiwi
PROC Star
data want;
  string = 'AAAAAA XXXXXXXXXXXXXXXX BBBBBB';
  do i = 1 to countw(string);
    if length(scan(string, i)) = 16 then wanted_word = scan(string, i);
  end;
  put _all_;
run;
smilbuta
Fluorite | Level 6
I implemented this option and this worked perfectly!!!
Thanks @SASKiwi
ChrisNZ
Tourmaline | Level 20

Are  A and B numbers too or is X the only digit?  Are the brackets part of the string?

smilbuta
Fluorite | Level 6

A and B are alpha numeric, X is a numeric value.

ChrisNZ
Tourmaline | Level 20

Like this?

 

data WANT;
  input STRING :&$32.;
  DIGITS=prxchange('s/(?:\w{6} (\d{16}) \w{6}|\w{6} \w{6} (\d{16})|(\d{16}) \w{6} \w{6}|.*)/\1\2\3/',1,STRING);
  cards;
ssssss 6676543456545678 llllll
llllll ssssss 6676543456545678
6676543456545678 ssssss llllll
ssssss 643456545678 llllll    
run;
STRING DIGITS
ssssss 6676543456545678 llllll 6676543456545678
llllll ssssss 6676543456545678 6676543456545678
6676543456545678 ssssss llllll 6676543456545678
ssssss 643456545678 llllll  

 

 

Ksharp
Super User
data WANT;
  input STRING :&$32.;
  p=prxmatch('/\d{16}/',string);
  if p then DIGITS=substr(string,p,16);
  drop p ;
  cards;
ssssss 6676543456545678 llllll
llllll ssssss 6676543456545678
6676543456545678 ssssss llllll
ssssss 643456545678 llllll    
run;
andreas_lds
Jade | Level 19

@smilbuta : you wrote: "In most cases this string is a 16 digit record number." - can you explain what to extract in the other cases?

 

 

smilbuta
Fluorite | Level 6

Yes,

In the historical data, sometimes a record number was manually entered that does not conform to the norm. It will be a singular alphanumeric digit of variable length. The size of all of those is under 16 digits. They are few and far between and I would be able to reconcile those after the fact. 

 

By extracting the 16 digit value i am seeking and populating the new column, any blanks remaining in the new column would be non conforming data from the original source.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1481 views
  • 3 likes
  • 5 in conversation