DATA Step, Macro, Functions and more

Parsing CamelCase

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Parsing CamelCase

Using SAS 9.3, I am trying to parse a variable, Vehicle, of auto names written (mostly) in camel case. For example, 

 

JeepGrand Cherokee

JeepRenegade

JeepWrangler

...other records not in camel case

Jeep Renegade

 

I'd like to define a new variable, Model, containing the model names of the vehicles. For the records separating make from model with a space, I have a solution using the findw function:

 

if findw(Vehicle, "Renegade") then Vehicle2 = "Renegade"

 

This does not work for the camel case records. One solution I have worked out is to use a combination of the scan and tranwrd function. 

 

if findw(strip(tranwrd(scan(Vehicle, 1, " "), "Jeep", " ")), "Wrangler") then Vehicle2 = "Wrangler"

 

This works because I have a small number of makes in the data set. But how could I generalize this parsing to a larger list of automakers? In general, can we specify "case" as a delimiter in any character functions? 

 

I appreciate any feed back anyone can provide.

 

Dhrumil Patel

 

 

 

 


Accepted Solutions
Solution
‎05-24-2016 02:34 PM
Super User
Posts: 17,824

Re: Parsing CamelCase

There's going to be a PRX function that's much more efficient, but you can use the ANYUPPER()/SUBSTR() function. 

 

data want;
input details $32.;
x=anyupper(details, 2);
make=substr(details, 1, x-1);
model=substr(details, x);
cards;
JeepGrand Cherokee
JeepRenegade
JeepWrangler
;
run;

proc print data=want;run;

View solution in original post


All Replies
Solution
‎05-24-2016 02:34 PM
Super User
Posts: 17,824

Re: Parsing CamelCase

There's going to be a PRX function that's much more efficient, but you can use the ANYUPPER()/SUBSTR() function. 

 

data want;
input details $32.;
x=anyupper(details, 2);
make=substr(details, 1, x-1);
model=substr(details, x);
cards;
JeepGrand Cherokee
JeepRenegade
JeepWrangler
;
run;

proc print data=want;run;
Frequent Contributor
Posts: 85

Re: Parsing CamelCase

Using PRX functions looks inviting but this can also be done simply with:

 

model =substr(vehicle, findc(vehicle, , 2, 'SU'));

 

 

Trusted Advisor
Posts: 1,115

Re: Parsing CamelCase


JerryLeBreton wrote:

Using PRX functions looks inviting but this can also be done simply with:

 

model =substr(vehicle, findc(vehicle, , 2, 'SU')); 


Good idea. One could apply the LEFT function to the result in order to avoid discrepancies like 'Renegade' vs. ' Renegade'.

Contributor
Posts: 22

Re: Parsing CamelCase

Thanks Jerry. This also works and is compact!

 

A couple questions:

 

(1) Why do you have to start at the second character position?

 

(2) Does a blank argument for the charlist include all alphanumeric character? (Couldn't find this answer in the SAS Documentation of the findc function.)

 

Thank you again.

 

 

Frequent Contributor
Posts: 85

Re: Parsing CamelCase

[ Edited ]

dhrumil_patel wrote:

 

A couple questions:

 

(1) Why do you have to start at the second character position?

 

(2) Does a blank argument for the charlist include all alphanumeric character? (Couldn't find this answer in the SAS Documentation of the findc function.)


If you start at the first character, the answer will include the Make of the vehicle.

 

And leaving the  charlist argument blank effectively means 'any character(s)'.  The modifiers U and S stipulate what to search for.  The doco for the FINDC  function is quite good.

 

Oh, and use @FreelanceReinhard 's suggestion to wrap a LEFT function around the result to finish the job properly.

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 365 views
  • 3 likes
  • 4 in conversation