BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vlapoutre
Calcite | Level 5

I have a table with observations and I want to assign those to specific categories based on rules.

My sample data is e.g. the following:

	create table work.observations
	(id char(2),
	can_fly num,
	furry num,
	levitates num)
	;
	
	insert into work.observations
	values('AB', 0, 0, 0)
	values('AC', 1, 0, 0)
	values('AD', 0, 1, 0)
	values('BC', ., 1, 0)
	values('BD', ., 0, 0)
	;

Now I want to assign the species.

For this example, assume there are three possibilities, a penguin, sparrow, or a cat. If the animal is furry, I want to assign "cat". If it doesn't, it's a bird, which one is determined by whether or not it can fly. If I already know it's a cat, I don't care about the value of the "can_fly" column.

I could do this for one species at a time in the following way:

%let species = cat;
%let furry = (1);
%let can_fly = (0, 1, .);
%let levitates = (0);

proc sql;
	create table work.assigned_observations as
	select distinct
		o.id,
		o.can_fly,
		o.furry,
		o.levitates,
		case when o.can_fly in &can_fly.
				and o.furry in &furry.
				and o.levitates in &levitates.
			then "&species." end as species
	from work.observations o 
	;
quit;run;

However, I want to do this for all at once. I was thinking of writing the possible settings to a file (e.g. excel), then loading this into a table and perform a join. However, I haven't been able to get the possibilities as lists as in the above example, so the best I could get working includes string comparisons, which is inefficient. See the following:

proc sql;
	create table work.species
	(species char(10),
	can_fly char(10),
	furry char(10),
	levitates char(10))
	;
	
	insert into work.species
	values('penguin', '(0)', '(0)', '(0)')
	values('cat', '(0, 1, .)', '(1)', '(0)')
	values('sparrow', '(1)', '(0)', '(0)')
	;
	
	title 'Different species'
	
	select * from work.species
proc printto; run;

proc sql;
	create table work.assigned_observations as
	select distinct 
		o.id,
		o.can_fly,
		o.furry,
		o.levitates,
		s.species
	from work.observations o
	left join work.species s
		on s.can_fly like catx('', '%', put(o.can_fly, 11.), '%')
		and s.furry like catx('', '%', put(o.furry, 11.), '%')
		and s.levitates like catx('', '%', put(o.levitates, 11.), '%')
	;
quit;run;

How can I improve this, while keeping the flexibility of having multiple options and keeping the input file (with the settings for the different species) readable?

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

A simpler way may be to keep a column in SPECIES missing, if you do not care about it for this species, e.g.:

	create table work.species
	(species char(10),
	can_fly num,
	furry num,
	levitates num)
	;
	
	insert into work.species
	values('penguin', 0,0,0)
	values('cat', .,1,0)
	values('sparrow', 1,0,0)
	;

You should then be able to match like this:

	create table assigned_observations as
	select observations.*,species.species
	from observations left join species
	on (observations.furry=species.furry or species.furry is null) and
	   (observations.can_fly=species.can_fly or species.can_fly is null) and
	   (observations.levitates=species.levitates or species.levitates is null)
	   ;

If you only want the observations that can actually be assigned to a species, just leave out the "left" in the final SQL.

View solution in original post

1 REPLY 1
s_lassen
Meteorite | Level 14

A simpler way may be to keep a column in SPECIES missing, if you do not care about it for this species, e.g.:

	create table work.species
	(species char(10),
	can_fly num,
	furry num,
	levitates num)
	;
	
	insert into work.species
	values('penguin', 0,0,0)
	values('cat', .,1,0)
	values('sparrow', 1,0,0)
	;

You should then be able to match like this:

	create table assigned_observations as
	select observations.*,species.species
	from observations left join species
	on (observations.furry=species.furry or species.furry is null) and
	   (observations.can_fly=species.can_fly or species.can_fly is null) and
	   (observations.levitates=species.levitates or species.levitates is null)
	   ;

If you only want the observations that can actually be assigned to a species, just leave out the "left" in the final SQL.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 514 views
  • 0 likes
  • 2 in conversation