Hello,
I would like to extract 4 continuous numbers (not including 000) from the Name in the dataset. I list the result I am looking for in the dataset too.
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
run;
Is there a way to appoach this? Thank you.
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
p=prxmatch('/[1-9]{4}/',name);
if p then want=substr(name,p,4);
else do;
p=prxmatch('/\d{4}/',name);
if p then want=substr(name,p,4);
end;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
You can use the COMPRESS option to get rid of special characters, such as underscores (and any other special characters you want to get rid of. You can also use TRANWRD to eliminate the 000 that you don't want, and what you are left with is the desired result.
result=tranwrd(compress(name,'_'),'000','');
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
run;
data want;
set datain1;
length want $4;
want=prxchange('s/(^000*_|_*)(\d{4})/$2/',-1, name);
run;
Your code is not working, the result is not I like.
Please take the code provided by @mkeintz , much more intuitive
Hi @ybz12003 And I made a slight change. For what it's worth, please see if this works
want=prxchange('s/(^000_*|_*)(\d{4})/$2/',-1, name);
I am always curious this fancy Peri commands.
^000 -- Remove triple 0 ?
*_|_* -- Remove underscores front and back ?
\d{4} -- Four numbers in a row?
But what is about $2? and what is -1? what are those frontslashs '/' for?
Hi @ybz12003 More than getting bits and pieces answers, I highly recommend to rely on
I've just never been much of a fan of regex functions in SAS, so here's a non-regex suggestion:
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
data want (drop=I);
set datain1;
length rslt $8;
/* scan backward in NAME until finding a substring of length 4 or more */
do i=1 to countw(name,'_') until (length(rslt)>=4);
rslt=scan(name,-i,'_');
end;
/* In case the substring is too long, take the rightmost 4 characters*/
rslt=substr(rslt,length(rslt)-3);
run;
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
p=prxmatch('/[1-9]{4}/',name);
if p then want=substr(name,p,4);
else do;
p=prxmatch('/\d{4}/',name);
if p then want=substr(name,p,4);
end;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
That is so cool!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.