I have a variable 'description' that has something like 'bob walked 5 km' or 'joe swam for 3.2 km yesterday night'
I'm trying to extract the values of '5' and '3.2' from this variable into a new variable, however, I'm stuck trying to figure out a way after i find the position of the "km" string. I haven't been able to find a way to read left of the indexed position. As a note, there is always a space before the "km"
I have this
Person | Description |
Bob | Bob walked 5 km at 10PM |
Joe | Joe swan for 3.2 km yesterday |
I ideally want to create a table that looks like this
Person | Description | Distance |
Bob | Bob walked 5 km at 10PM | 5 |
Joe | Joe swan for 3.2 km yesterday | 3.2 |
Use the ANYDIGIT function to find the first digit. Then use SCAN to find the next "word".
where = anydigit(description);
number = scan(substr(description,where),1);
Of course, this and just about any other manipulation requires an assumption or two, which you need to make sure is valid for the larger data set.
Use the ANYDIGIT function to find the first digit. Then use SCAN to find the next "word".
where = anydigit(description);
number = scan(substr(description,where),1);
Of course, this and just about any other manipulation requires an assumption or two, which you need to make sure is valid for the larger data set.
Thank you so much! I've never seen this function before - it looks super useful!
Sorry - would the ANYDIGIT also work if I have other digits in the cell too? Thank you again, I really appreciate your advice.
i.e.,
Person | Description |
Bob | Bob walked 5 km at 10PM |
Joe | Joe swan for 3.2 km yesterday |
ANYDIGIT finds the location of the first digit in the string.
If you are specifically looking for digits preceding "km" and want to guard against extracting digits that appear way before "km", here is one way to do it. Not very elegant but seems to do the job. If you want functions to search backwards, use negative integers for the position.
data want ; set have ; distance = input(scan(substr(description,1,index(description,"km")-1),-1," "),best.) ; run ;
This is the perfect solution thank you! What does the 'best' function do here?
data have;
input Person $ Description $30.;
cards;
Bob Bob walked 5 km at 10PM
Joe Joe swan for 3.2 km yesterday
;
data want;
set have;
distance =scan(Description,findw(Description,'km',' ','ie')-1,' ');
run;
data have;
input Person $ Description $30.;
cards;
Bob Bob walked 5 km at 10PM
Joe Joe swan for 3.2 km yesterday
;
data want;
set have;
distance=prxchange('s/(.*?)(\d+?\.?\d*?)(\skm)(.*)/$2/', -1, Description);
run;
data have;
input Person $ Description $30.;
cards;
Bob Bob walked 5 km at 10PM
Joe Joe swan for 3.2 km yesterday
;
data want;
set have;
pid=prxparse('/[\d\. ]+(?=km)/i');
if pid then do;
call prxsubstr(pid,Description,p,l);
want=substr(Description,p,l);
end;
drop pid p l;
run;
proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.