BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MB_Analyst
Obsidian | Level 7

I have to roll-up a lot of locations but I'm able to look for keywords to classify them. I can accomplish this with multiple IF statements but would prefer to use IN and a wildcard or FINDW operator. I'm wondering if I can combine these. Here is the original method:

 

DATA want;
    SET have;
    if location = "STATE1A, CITY1" then new = "Group 1";
    if location = "STATE1A, CITY2" then new = "Group 1";
if location = "STATE2B, CITY3" then new = "Group 1";
if location = "STATE2B, CITY4" then new = "Group 1"; if location = "STATE3A, CITY1" then new = "Group 2"; if location = "STATE3A, CITY2" then new = "Group 2";
if location = "STATE4B, CITY3" then new = "Group 2";
if location = "STATE4B, CITY4" then new = "Group 2"; RUN;

You can see that I'm only concerned about STATE1, STATE2, etc, regardless of the city. I need to search each observation for the correct State name (there can be 5-6 different ones), and I'm not concerned about the city name. The state name is varying lengths in the string. If I wanted to look for just one state name I would do the below, but I need to include a few states within the IN() command.

IF FINDW(lowcase(location),"state1") > 0 then new = Group a;

I would like to try:

IF location IN(%STATE1A%, %STATE2B%) then new = "Group 1";
IF location  IN(%STATE3A%, %STATE4B%) then new = "Group 2";

 

I'm unsure if I should be using the pattern matching with %, or the FINDW command within the IN()?

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One questions why state and city have been comingled this way to begin with but you may consider:

 

/*
If scan(location,1) in ("State1A" "STATE2B") then ..
*/
data example; 
  input location  $ 1-20;
  if scan(location,1,',') in ("STATE1A" "STATE2B") then new = "Group 1";
  else if scan(location,1) in ("STATE3A" "STATE4B") then new = "Group 2";
datalines;
STATE1A, CITY1
STATE2B, CITY2
STATE3A, CITY1
STATE4B, CITY2
;

If you must use if then.

You could consider moving the logic to a custom format.

Proc format libarary=work;
value $loc_state
"STATE1A", 
"STATE2B"="Group 1"
"STATE3A",
"STATE4B"="Group 2"
;
run;

data example2; 
  input location  $ 1-20;
  new = put( scan(location,1,','),$loc_state.);
datalines;
STATE1A, CITY1
STATE2B, CITY2
STATE3A, CITY1
STATE4B, CITY2
;

This has an advantage that if you get new "state" values you only have to change the format, no additional if then else coding.

If the addition is to an existing group it is only adding one value to the current list, or create a new group.

 

I would also strongly recommend why you are creating a value like "Group 1". If you have enough values, i.e. "Group 10" then display and sort order will be out of numeric order. It might be better to create an actual numeric value and a name or label like "Group" instead.

 

With a custom informat that is also easy, as well as incorporating a warning that an unexpected value has been encountered:

Proc format libarary=work;
invalue loc_state (upcase)
"STATE1A", 
"STATE2B"=1
"STATE3A",
"STATE4B"=2
' '=.
other= _error_
;
run;

data example3; 
  input location  $ 1-20;
  group = input(scan(location,1,','),loc_state.);
datalines;
STATE1A, CITY1
STATE2B, CITY2
STATE3A, CITY1
STATE4B, CITY2
garbage, CITY2

;

The invalue option (UPCASE) indicates that the value is to be converted to upper case for the value assignment. Which may help if your data actually contains values like "State1A"  state1a" "sTate1A" where case may vary.

 

View solution in original post

2 REPLIES 2
ballardw
Super User

One questions why state and city have been comingled this way to begin with but you may consider:

 

/*
If scan(location,1) in ("State1A" "STATE2B") then ..
*/
data example; 
  input location  $ 1-20;
  if scan(location,1,',') in ("STATE1A" "STATE2B") then new = "Group 1";
  else if scan(location,1) in ("STATE3A" "STATE4B") then new = "Group 2";
datalines;
STATE1A, CITY1
STATE2B, CITY2
STATE3A, CITY1
STATE4B, CITY2
;

If you must use if then.

You could consider moving the logic to a custom format.

Proc format libarary=work;
value $loc_state
"STATE1A", 
"STATE2B"="Group 1"
"STATE3A",
"STATE4B"="Group 2"
;
run;

data example2; 
  input location  $ 1-20;
  new = put( scan(location,1,','),$loc_state.);
datalines;
STATE1A, CITY1
STATE2B, CITY2
STATE3A, CITY1
STATE4B, CITY2
;

This has an advantage that if you get new "state" values you only have to change the format, no additional if then else coding.

If the addition is to an existing group it is only adding one value to the current list, or create a new group.

 

I would also strongly recommend why you are creating a value like "Group 1". If you have enough values, i.e. "Group 10" then display and sort order will be out of numeric order. It might be better to create an actual numeric value and a name or label like "Group" instead.

 

With a custom informat that is also easy, as well as incorporating a warning that an unexpected value has been encountered:

Proc format libarary=work;
invalue loc_state (upcase)
"STATE1A", 
"STATE2B"=1
"STATE3A",
"STATE4B"=2
' '=.
other= _error_
;
run;

data example3; 
  input location  $ 1-20;
  group = input(scan(location,1,','),loc_state.);
datalines;
STATE1A, CITY1
STATE2B, CITY2
STATE3A, CITY1
STATE4B, CITY2
garbage, CITY2

;

The invalue option (UPCASE) indicates that the value is to be converted to upper case for the value assignment. Which may help if your data actually contains values like "State1A"  state1a" "sTate1A" where case may vary.

 

Jagadishkatam
Amethyst | Level 16

Please try

 

DATA want;
    SET have;
    if location in ( "STATE1A, CITY1", "STATE1A, CITY2" "STATE2B, CITY3" "STATE2B, CITY4" ) then new = "Group 1";  
    if location in ( "STATE3A, CITY1"  "STATE3A, CITY2" "STATE4B, CITY3"  "STATE4B, CITY4" ) then new = "Group 2";
RUN;


DATA want;
    SET have;
    if findw( location, "STATE1A" ) or findw(location, "STATE2B") then new = "Group 1";  
    if findw(location, "STATE3A" ) or findw(location, "STATE4B") then new = "Group 2";
RUN;
Thanks,
Jag

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 706 views
  • 1 like
  • 3 in conversation