SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello,

 

I would like to extract 4 continuous numbers (not including 000)  from the Name in the dataset.  I list the result I am looking for in the dataset too.  


data datain1;
      infile datalines dsd;
  input Name : $300.  Result : $100. ;
datalines;
	__5648_, 5648
	0009463, 9463
	000_4721, 4721 
	4721__, 4721
	__0065_, 0065
	9463__1, 9463
	__5648__000, 5648
	_5648_77, 5648
;

run;

Is there a way to appoach this?   Thank you.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data datain1;
      infile datalines dsd;
  input Name : $300.  Result : $100. ;
p=prxmatch('/[1-9]{4}/',name);
if p then want=substr(name,p,4);
else do;
p=prxmatch('/\d{4}/',name);
if p then want=substr(name,p,4);
end;
datalines;
	__5648_, 5648
	0009463, 9463
	000_4721, 4721 
	4721__, 4721
	__0065_, 0065
	9463__1, 9463
	__5648__000, 5648
	_5648_77, 5648
;



View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

You can use the COMPRESS option to get rid of special characters, such as underscores (and any other special characters you want to get rid of. You can also use TRANWRD to eliminate the 000 that you don't want, and what you are left with is the desired result.

 

result=tranwrd(compress(name,'_'),'000','');
--
Paige Miller
novinosrin
Tourmaline | Level 20
data datain1;
      infile datalines dsd;
  input Name : $300.  Result : $100. ;
datalines;
	__5648_, 5648
	0009463, 9463
	000_4721, 4721 
	4721__, 4721
	__0065_, 0065
	9463__1, 9463
	__5648__000, 5648
	_5648_77, 5648
;

run;
data want;
 set datain1;
 length want $4;
 want=prxchange('s/(^000*_|_*)(\d{4})/$2/',-1, name);
run;
ybz12003
Rhodochrosite | Level 12

Your code is not working, the result is not I like.

novinosrin
Tourmaline | Level 20

Please take the code provided by @mkeintz  , much more intuitive 

novinosrin
Tourmaline | Level 20

Hi @ybz12003  And I made a slight change. For what it's worth, please see if this works

 

 want=prxchange('s/(^000_*|_*)(\d{4})/$2/',-1, name);
ybz12003
Rhodochrosite | Level 12

I am always curious this fancy Peri commands. 

 

^000 -- Remove triple 0 ?

*_|_* -- Remove underscores front and back ?

\d{4} -- Four numbers in a row?

 

 

But what is about $2?  and what is -1?  what are those frontslashs '/' for?

novinosrin
Tourmaline | Level 20

Hi @ybz12003  More than getting bits and pieces answers, I highly recommend to rely on

Image result for learning regular expressions by ben forta
Ben Forta's Learning Regular Expressions teaches you the regular expressions that you really need to know, starting with simple text matches and working up to more complex topics, including the use of backreferences, conditional evaluation, and look-ahead processing.
 
Trust me, within 5 days you will start contributing . Ben forta is almighty's blessing to RDBMS world of data manipulation/other programming languages.
mkeintz
PROC Star

I've just never been much of a fan of regex functions in SAS, so here's a non-regex suggestion:

 

data datain1;
      infile datalines dsd;
  input Name : $300.  Result : $100. ;
datalines;
	__5648_, 5648
	0009463, 9463
	000_4721, 4721 
	4721__, 4721
	__0065_, 0065
	9463__1, 9463
	__5648__000, 5648
	_5648_77, 5648
;

data want (drop=I);
  set datain1;
  length rslt $8;
  /* scan backward in NAME until finding a substring of length 4 or more */
  do i=1 to countw(name,'_') until (length(rslt)>=4);
    rslt=scan(name,-i,'_');
  end;
  /* In case the substring is too long, take the rightmost 4 characters*/
  rslt=substr(rslt,length(rslt)-3);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data datain1;
      infile datalines dsd;
  input Name : $300.  Result : $100. ;
p=prxmatch('/[1-9]{4}/',name);
if p then want=substr(name,p,4);
else do;
p=prxmatch('/\d{4}/',name);
if p then want=substr(name,p,4);
end;
datalines;
	__5648_, 5648
	0009463, 9463
	000_4721, 4721 
	4721__, 4721
	__0065_, 0065
	9463__1, 9463
	__5648__000, 5648
	_5648_77, 5648
;



ybz12003
Rhodochrosite | Level 12

That is so cool!

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1423 views
  • 2 likes
  • 5 in conversation