BookmarkSubscribeRSS Feed
Etoo12121
Obsidian | Level 7

 

So I have something that looks easy but I can't seem to figure out how to do this easily. I'm trying to write an if-then-else statement to create the variable "Grouping". 

Essentially, what I would like is something like this:

 

if name1 or name2 or name3 or name4 or name5 in ('Sheila','Chris','Roman') then Grouping = 'East'
else if name1 or name2 or name3 or name4 or name5 in (Tyler, Josh, Rebecca) then Grouping = West;
else Grouping = missing

 

I know it can't be written exactly as I have it above. Is there a better way (perhaps a special function) to write this in a data step without listing name1, name2, name3 etc separately? I would have to iterate this 25 times with the actual data I have.

 

IDName1Name2Name3Name4Name5Grouping
1SheilaCleo   East
2TristanLucasTimothyMikeRomanEast
3CyrusLeoTyler  West
4JohnSylvesterRobertTyler West
5DavidChrisLeo  East
6LilyLizChristineRebecca West
7AndrewPeterMaxAnthonyJoshWest
3 REPLIES 3
ballardw
Super User

@Etoo12121 wrote:

 

So I have something that looks easy but I can't seem to figure out how to do this easily. I'm trying to write an if-then-else statement to create the variable "Grouping". 

Essentially, what I would like is something like this:

 

if name1 or name2 or name3 or name4 or name5 in ('Sheila','Chris','Roman') then Grouping = 'East'
else if name1 or name2 or name3 or name4 or name5 in (Tyler, Josh, Rebecca) then Grouping = West;
else Grouping = missing

 

I know it can't be written exactly as I have it above. Is there a better way (perhaps a special function) to write this in a data step without listing name1, name2, name3 etc separately? I would have to iterate this 25 times with the actual data I have.

 

ID Name1 Name2 Name3 Name4 Name5 Grouping
1 Sheila Cleo       East
2 Tristan Lucas Timothy Mike Roman East
3 Cyrus Leo Tyler     West
4 John Sylvester Robert Tyler   West
5 David Chris Leo     East
6 Lily Liz Christine Rebecca   West
7 Andrew Peter Max Anthony Josh West

How sure are you that none of the id values have a "name" from more than one group? If this might happen you have a lot of additional rules to provide about how to resolve those situations.

 

Which is shorter, the list of name values or the number of variables involved? The WHICHC function might be more efficient if you have smaller number of names than variables but you would be testing for each of the names to be present in the list of variables.

 

 

Etoo12121
Obsidian | Level 7
The list of name values is shorter. I only have 7 options in my actual dataset but I have 25 variables to go through.
WHICHC seems to work in a situation where I have one variable with many values. My case is the opposite. I have 25 variables with a handful of values.
Kurt_Bremser
Super User

Put your group assignments into a dataset, and use it as a lookup. This works best with a long dataset layout, something you should always strive for, as it makes coding easier:

data group;
input name $ grouping $;
datalines;
Sheila East
Chris East
Roman East
Tyler West
Josh West
Rebecca West
;

data have;
infile datalines dlm='09'x dsd truncover;
input ID $ Name1 $ Name2 $ Name3 $ Name4 $ Name5 $;
datalines;
1	Sheila	Cleo	 	 	 	
2	Tristan	Lucas	Timothy	Mike	Roman	
3	Cyrus	Leo	Tyler	 	 	
4	John	Sylvester	Robert	Tyler	 	
5	David	Chris	Leo	 	 	
6	Lily	Liz	Christine	Rebecca	 	
7	Andrew	Peter	Max	Anthony	Josh	
;

proc transpose
  data=have
  out=long (
    rename=(col1=name)
    where=(name ne "")
  )
;
by id;
var name:;
run;

data found;
set long;
by id;
if _n_ = 1
then do;
  length grouping $8;
  declare hash g (dataset:"group");
  g.definekey("name");
  g.definedata("grouping");
  g.definedone();
end;
if g.find() = 0 then output;
keep id grouping;
run;

proc sort data=found nodupkey;
by id grouping;
run;

data want;
merge
  have
  found
;
by id;
run;

Instead of the hash object, you can also do a data step MERGE or a join in SQL, if you prefer that.

The bottom line is that you do not have your group relationship anywhere in the code, only in the lookup dataset. This makes your code static and keeps dynamic values in data, where they should be.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 856 views
  • 0 likes
  • 3 in conversation