Using SAS 9.3, I am trying to parse a variable, Vehicle, of auto names written (mostly) in camel case. For example,
JeepGrand Cherokee
JeepRenegade
JeepWrangler
...other records not in camel case
Jeep Renegade
I'd like to define a new variable, Model, containing the model names of the vehicles. For the records separating make from model with a space, I have a solution using the findw function:
if findw(Vehicle, "Renegade") then Vehicle2 = "Renegade"
This does not work for the camel case records. One solution I have worked out is to use a combination of the scan and tranwrd function.
if findw(strip(tranwrd(scan(Vehicle, 1, " "), "Jeep", " ")), "Wrangler") then Vehicle2 = "Wrangler"
This works because I have a small number of makes in the data set. But how could I generalize this parsing to a larger list of automakers? In general, can we specify "case" as a delimiter in any character functions?
I appreciate any feed back anyone can provide.
Dhrumil Patel
There's going to be a PRX function that's much more efficient, but you can use the ANYUPPER()/SUBSTR() function.
data want;
input details $32.;
x=anyupper(details, 2);
make=substr(details, 1, x-1);
model=substr(details, x);
cards;
JeepGrand Cherokee
JeepRenegade
JeepWrangler
;
run;
proc print data=want;run;
There's going to be a PRX function that's much more efficient, but you can use the ANYUPPER()/SUBSTR() function.
data want;
input details $32.;
x=anyupper(details, 2);
make=substr(details, 1, x-1);
model=substr(details, x);
cards;
JeepGrand Cherokee
JeepRenegade
JeepWrangler
;
run;
proc print data=want;run;
Using PRX functions looks inviting but this can also be done simply with:
model =substr(vehicle, findc(vehicle, , 2, 'SU'));
@JerryLeBreton wrote:
Using PRX functions looks inviting but this can also be done simply with:
model =substr(vehicle, findc(vehicle, , 2, 'SU'));
Good idea. One could apply the LEFT function to the result in order to avoid discrepancies like 'Renegade' vs. ' Renegade'.
Thanks Jerry. This also works and is compact!
A couple questions:
(1) Why do you have to start at the second character position?
(2) Does a blank argument for the charlist include all alphanumeric character? (Couldn't find this answer in the SAS Documentation of the findc function.)
Thank you again.
@dhrumil_patel wrote:
A couple questions:
(1) Why do you have to start at the second character position?
(2) Does a blank argument for the charlist include all alphanumeric character? (Couldn't find this answer in the SAS Documentation of the findc function.)
If you start at the first character, the answer will include the Make of the vehicle.
And leaving the charlist argument blank effectively means 'any character(s)'. The modifiers U and S stipulate what to search for. The doco for the FINDC function is quite good.
@Oh, and use @FreelanceReinhard 's suggestion to wrap a LEFT function around the result to finish the job properly.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.