BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smilbuta
Fluorite | Level 6

Hello Community,

I am working on data cleanup project and i have come across a item i am not sure how to tackle.

I have a data set with 5 columns, I need to retain 4 of them as is, but one of them I need to extract the largest string from.

In most cases this string is a 16 digit record number. 

 

The field will look like this, (the value i want is 'X')

 

Column Cell [AAAAAA XXXXXXXXXXXXXXXX BBBBBB] 

Or [XXXXXXXXXXXXXXXX AAAAAA BBBBBB]

Or [AAAAAA BBBBBB XXXXXXXXXXXXXXXX]

The 16 digit number can be in any of the three positions... (dont ask, legacy bookeeping)

 

And some times the same field will be dirty data with some random value entered as a place holder.

 

What is the best way to extract that 16 digit X  field from my data column and place it in a new column?

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star
data want;
  string = 'AAAAAA XXXXXXXXXXXXXXXX BBBBBB';
  do i = 1 to countw(string);
    if length(scan(string, i)) = 16 then wanted_word = scan(string, i);
  end;
  put _all_;
run;

View solution in original post

8 REPLIES 8
SASKiwi
PROC Star
data want;
  string = 'AAAAAA XXXXXXXXXXXXXXXX BBBBBB';
  do i = 1 to countw(string);
    if length(scan(string, i)) = 16 then wanted_word = scan(string, i);
  end;
  put _all_;
run;
smilbuta
Fluorite | Level 6
I implemented this option and this worked perfectly!!!
Thanks @SASKiwi
ChrisNZ
Tourmaline | Level 20

Are  A and B numbers too or is X the only digit?  Are the brackets part of the string?

smilbuta
Fluorite | Level 6

A and B are alpha numeric, X is a numeric value.

ChrisNZ
Tourmaline | Level 20

Like this?

 

data WANT;
  input STRING :&$32.;
  DIGITS=prxchange('s/(?:\w{6} (\d{16}) \w{6}|\w{6} \w{6} (\d{16})|(\d{16}) \w{6} \w{6}|.*)/\1\2\3/',1,STRING);
  cards;
ssssss 6676543456545678 llllll
llllll ssssss 6676543456545678
6676543456545678 ssssss llllll
ssssss 643456545678 llllll    
run;
STRING DIGITS
ssssss 6676543456545678 llllll 6676543456545678
llllll ssssss 6676543456545678 6676543456545678
6676543456545678 ssssss llllll 6676543456545678
ssssss 643456545678 llllll  

 

 

Ksharp
Super User
data WANT;
  input STRING :&$32.;
  p=prxmatch('/\d{16}/',string);
  if p then DIGITS=substr(string,p,16);
  drop p ;
  cards;
ssssss 6676543456545678 llllll
llllll ssssss 6676543456545678
6676543456545678 ssssss llllll
ssssss 643456545678 llllll    
run;
andreas_lds
Jade | Level 19

@smilbuta : you wrote: "In most cases this string is a 16 digit record number." - can you explain what to extract in the other cases?

 

 

smilbuta
Fluorite | Level 6

Yes,

In the historical data, sometimes a record number was manually entered that does not conform to the norm. It will be a singular alphanumeric digit of variable length. The size of all of those is under 16 digits. They are few and far between and I would be able to reconcile those after the fact. 

 

By extracting the 16 digit value i am seeking and populating the new column, any blanks remaining in the new column would be non conforming data from the original source.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1370 views
  • 3 likes
  • 5 in conversation