Using SAS 9.3, I am trying to parse a variable, Vehicle, of auto names written (mostly) in camel case. For example,
JeepGrand Cherokee
JeepRenegade
JeepWrangler
...other records not in camel case
Jeep Renegade
I'd like to define a new variable, Model, containing the model names of the vehicles. For the records separating make from model with a space, I have a solution using the findw function:
if findw(Vehicle, "Renegade") then Vehicle2 = "Renegade"
This does not work for the camel case records. One solution I have worked out is to use a combination of the scan and tranwrd function.
if findw(strip(tranwrd(scan(Vehicle, 1, " "), "Jeep", " ")), "Wrangler") then Vehicle2 = "Wrangler"
This works because I have a small number of makes in the data set. But how could I generalize this parsing to a larger list of automakers? In general, can we specify "case" as a delimiter in any character functions?
I appreciate any feed back anyone can provide.
Dhrumil Patel
There's going to be a PRX function that's much more efficient, but you can use the ANYUPPER()/SUBSTR() function.
data want;
input details $32.;
x=anyupper(details, 2);
make=substr(details, 1, x-1);
model=substr(details, x);
cards;
JeepGrand Cherokee
JeepRenegade
JeepWrangler
;
run;
proc print data=want;run;
There's going to be a PRX function that's much more efficient, but you can use the ANYUPPER()/SUBSTR() function.
data want;
input details $32.;
x=anyupper(details, 2);
make=substr(details, 1, x-1);
model=substr(details, x);
cards;
JeepGrand Cherokee
JeepRenegade
JeepWrangler
;
run;
proc print data=want;run;
Using PRX functions looks inviting but this can also be done simply with:
model =substr(vehicle, findc(vehicle, , 2, 'SU'));
@JerryLeBreton wrote:
Using PRX functions looks inviting but this can also be done simply with:
model =substr(vehicle, findc(vehicle, , 2, 'SU'));
Good idea. One could apply the LEFT function to the result in order to avoid discrepancies like 'Renegade' vs. ' Renegade'.
Thanks Jerry. This also works and is compact!
A couple questions:
(1) Why do you have to start at the second character position?
(2) Does a blank argument for the charlist include all alphanumeric character? (Couldn't find this answer in the SAS Documentation of the findc function.)
Thank you again.
@dhrumil_patel wrote:
A couple questions:
(1) Why do you have to start at the second character position?
(2) Does a blank argument for the charlist include all alphanumeric character? (Couldn't find this answer in the SAS Documentation of the findc function.)
If you start at the first character, the answer will include the Make of the vehicle.
And leaving the charlist argument blank effectively means 'any character(s)'. The modifiers U and S stipulate what to search for. The doco for the FINDC function is quite good.
@Oh, and use @FreelanceReinhard 's suggestion to wrap a LEFT function around the result to finish the job properly.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.