- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would like to extract 4 continuous numbers (not including 000) from the Name in the dataset. I list the result I am looking for in the dataset too.
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
run;
Is there a way to appoach this? Thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
p=prxmatch('/[1-9]{4}/',name);
if p then want=substr(name,p,4);
else do;
p=prxmatch('/\d{4}/',name);
if p then want=substr(name,p,4);
end;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use the COMPRESS option to get rid of special characters, such as underscores (and any other special characters you want to get rid of. You can also use TRANWRD to eliminate the 000 that you don't want, and what you are left with is the desired result.
result=tranwrd(compress(name,'_'),'000','');
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
run;
data want;
set datain1;
length want $4;
want=prxchange('s/(^000*_|_*)(\d{4})/$2/',-1, name);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your code is not working, the result is not I like.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please take the code provided by @mkeintz , much more intuitive
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ybz12003 And I made a slight change. For what it's worth, please see if this works
want=prxchange('s/(^000_*|_*)(\d{4})/$2/',-1, name);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am always curious this fancy Peri commands.
^000 -- Remove triple 0 ?
*_|_* -- Remove underscores front and back ?
\d{4} -- Four numbers in a row?
But what is about $2? and what is -1? what are those frontslashs '/' for?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ybz12003 More than getting bits and pieces answers, I highly recommend to rely on
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've just never been much of a fan of regex functions in SAS, so here's a non-regex suggestion:
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
data want (drop=I);
set datain1;
length rslt $8;
/* scan backward in NAME until finding a substring of length 4 or more */
do i=1 to countw(name,'_') until (length(rslt)>=4);
rslt=scan(name,-i,'_');
end;
/* In case the substring is too long, take the rightmost 4 characters*/
rslt=substr(rslt,length(rslt)-3);
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data datain1;
infile datalines dsd;
input Name : $300. Result : $100. ;
p=prxmatch('/[1-9]{4}/',name);
if p then want=substr(name,p,4);
else do;
p=prxmatch('/\d{4}/',name);
if p then want=substr(name,p,4);
end;
datalines;
__5648_, 5648
0009463, 9463
000_4721, 4721
4721__, 4721
__0065_, 0065
9463__1, 9463
__5648__000, 5648
_5648_77, 5648
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is so cool!