Hi SAS coders!
I am trying to extract the county from a string variable that is meshed in with the estimate and CI.
So the variable (called Regional) looks like this:
LARIMER County/Regional Estimate 14% (95% C.I.: 11.9 - 16.2)
LAS ANIMASCounty/Regional Estimate 16.7% (95% C.I.: 9.6 - 23.8)
I am trying to extract the LARIMER and LAS ANIMAS into their own variable and call it "COUNTY"
I've used SCAN, SUBSTR, COMPRESS, all the extracting functions and they always come out not ideal. My end result should be
LARIMER
LAS ANIMAS
The code I was using is:
The end result of this code is
LARIMER
LASANIMAS
I am trying to produce the output where LASANIMAS has a space so it should look like LAS ANIMAS.
Also some of the county names that I am working with start with C like "Cheyenne" and because I put the delimiter as "C" the county name cuts off so it produces an output of "heyenne"
I don't know what I am doing wrong.
Find the location of the word COUNTY. On your first row, that word begins at position 9. So then take the text in positions 1 through 8 (and optionally trim the blank off the end)
position=find(regional,'county','i');
county_name=substr(regional,1,position-1);
Don't forget to set a length for your county name long enough to hold the longest expected value.
It appears that REGIONAL has the county name, followed by the string "County", followed by a forward slash, and then other information irrelevant to extracting the county name. And the string "County" may or may not be preceded by a blank.
Then
(1) extract the text prior to the "/": = scan(regional,1,'/')
(2) then eliminate the last 6 non-blank characters: =substr(of above,1,length(of above)-6).
data want;
set have;
length county $25;
county=substr(scan(regional,1,'/')
,1
,length(scan(regional,1,'/'))-6
);
run;
Now if the word "County" was always spelled in lowercase except the first letter, you could avoid using the substr function in (2). Instead, you could apply a somewhat simpler expression:
data want;
set have;
length county $25;
county=tranwrd(scan(regional,1,'/'),'County','');
run;
This, of course, assumes that the string 'County' does not occur twice (once before the slash, and once as part of the county's proper name) in the REGION variable.
If you wanted the keep the spaces why did you remove them? Remove the COMPRESS() function call.
Are you sure none of the county names have an uppercase letter C? If they do your SCAN() function call is also not what you want.
And below yet another option using a regular expression.
data have;
infile datalines truncover;
input have_str $100.;
datalines;
LARIMER County/Regional Estimate 14% (95% C.I.: 11.9 - 16.2)
LAS ANIMASCounty/Regional Estimate 16.7% (95% C.I.: 9.6 - 23.8)
xxx xxx/Regional Estimate 16.7% (95% C.I.: 9.6 - 23.8)
;
data want(drop=_:);
set have;
length county $40.;
_prxid = prxparse('/^.*(?=county\/)/oi');
call prxsubstr(_prxid, have_str, _pos, _len);
if _len>0 then county = substr(have_str, _pos, _len);
run;
proc print data=want;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.