- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a variable 'description' that has something like 'bob walked 5 km' or 'joe swam for 3.2 km yesterday night'
I'm trying to extract the values of '5' and '3.2' from this variable into a new variable, however, I'm stuck trying to figure out a way after i find the position of the "km" string. I haven't been able to find a way to read left of the indexed position. As a note, there is always a space before the "km"
I have this
Person | Description |
Bob | Bob walked 5 km at 10PM |
Joe | Joe swan for 3.2 km yesterday |
I ideally want to create a table that looks like this
Person | Description | Distance |
Bob | Bob walked 5 km at 10PM | 5 |
Joe | Joe swan for 3.2 km yesterday | 3.2 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use the ANYDIGIT function to find the first digit. Then use SCAN to find the next "word".
where = anydigit(description);
number = scan(substr(description,where),1);
Of course, this and just about any other manipulation requires an assumption or two, which you need to make sure is valid for the larger data set.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use the ANYDIGIT function to find the first digit. Then use SCAN to find the next "word".
where = anydigit(description);
number = scan(substr(description,where),1);
Of course, this and just about any other manipulation requires an assumption or two, which you need to make sure is valid for the larger data set.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much! I've never seen this function before - it looks super useful!
Sorry - would the ANYDIGIT also work if I have other digits in the cell too? Thank you again, I really appreciate your advice.
i.e.,
Person | Description |
Bob | Bob walked 5 km at 10PM |
Joe | Joe swan for 3.2 km yesterday |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ANYDIGIT finds the location of the first digit in the string.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are specifically looking for digits preceding "km" and want to guard against extracting digits that appear way before "km", here is one way to do it. Not very elegant but seems to do the job. If you want functions to search backwards, use negative integers for the position.
data want ; set have ; distance = input(scan(substr(description,1,index(description,"km")-1),-1," "),best.) ; run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is the perfect solution thank you! What does the 'best' function do here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Person $ Description $30.;
cards;
Bob Bob walked 5 km at 10PM
Joe Joe swan for 3.2 km yesterday
;
data want;
set have;
distance =scan(Description,findw(Description,'km',' ','ie')-1,' ');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Person $ Description $30.;
cards;
Bob Bob walked 5 km at 10PM
Joe Joe swan for 3.2 km yesterday
;
data want;
set have;
distance=prxchange('s/(.*?)(\d+?\.?\d*?)(\skm)(.*)/$2/', -1, Description);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Person $ Description $30.;
cards;
Bob Bob walked 5 km at 10PM
Joe Joe swan for 3.2 km yesterday
;
data want;
set have;
pid=prxparse('/[\d\. ]+(?=km)/i');
if pid then do;
call prxsubstr(pid,Description,p,l);
want=substr(Description,p,l);
end;
drop pid p l;
run;
proc print;run;