BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vlapoutre
Calcite | Level 5

I have a table with observations and I want to assign those to specific categories based on rules.

My sample data is e.g. the following:

	create table work.observations
	(id char(2),
	can_fly num,
	furry num,
	levitates num)
	;
	
	insert into work.observations
	values('AB', 0, 0, 0)
	values('AC', 1, 0, 0)
	values('AD', 0, 1, 0)
	values('BC', ., 1, 0)
	values('BD', ., 0, 0)
	;

Now I want to assign the species.

For this example, assume there are three possibilities, a penguin, sparrow, or a cat. If the animal is furry, I want to assign "cat". If it doesn't, it's a bird, which one is determined by whether or not it can fly. If I already know it's a cat, I don't care about the value of the "can_fly" column.

I could do this for one species at a time in the following way:

%let species = cat;
%let furry = (1);
%let can_fly = (0, 1, .);
%let levitates = (0);

proc sql;
	create table work.assigned_observations as
	select distinct
		o.id,
		o.can_fly,
		o.furry,
		o.levitates,
		case when o.can_fly in &can_fly.
				and o.furry in &furry.
				and o.levitates in &levitates.
			then "&species." end as species
	from work.observations o 
	;
quit;run;

However, I want to do this for all at once. I was thinking of writing the possible settings to a file (e.g. excel), then loading this into a table and perform a join. However, I haven't been able to get the possibilities as lists as in the above example, so the best I could get working includes string comparisons, which is inefficient. See the following:

proc sql;
	create table work.species
	(species char(10),
	can_fly char(10),
	furry char(10),
	levitates char(10))
	;
	
	insert into work.species
	values('penguin', '(0)', '(0)', '(0)')
	values('cat', '(0, 1, .)', '(1)', '(0)')
	values('sparrow', '(1)', '(0)', '(0)')
	;
	
	title 'Different species'
	
	select * from work.species
proc printto; run;

proc sql;
	create table work.assigned_observations as
	select distinct 
		o.id,
		o.can_fly,
		o.furry,
		o.levitates,
		s.species
	from work.observations o
	left join work.species s
		on s.can_fly like catx('', '%', put(o.can_fly, 11.), '%')
		and s.furry like catx('', '%', put(o.furry, 11.), '%')
		and s.levitates like catx('', '%', put(o.levitates, 11.), '%')
	;
quit;run;

How can I improve this, while keeping the flexibility of having multiple options and keeping the input file (with the settings for the different species) readable?

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

A simpler way may be to keep a column in SPECIES missing, if you do not care about it for this species, e.g.:

	create table work.species
	(species char(10),
	can_fly num,
	furry num,
	levitates num)
	;
	
	insert into work.species
	values('penguin', 0,0,0)
	values('cat', .,1,0)
	values('sparrow', 1,0,0)
	;

You should then be able to match like this:

	create table assigned_observations as
	select observations.*,species.species
	from observations left join species
	on (observations.furry=species.furry or species.furry is null) and
	   (observations.can_fly=species.can_fly or species.can_fly is null) and
	   (observations.levitates=species.levitates or species.levitates is null)
	   ;

If you only want the observations that can actually be assigned to a species, just leave out the "left" in the final SQL.

View solution in original post

1 REPLY 1
s_lassen
Meteorite | Level 14

A simpler way may be to keep a column in SPECIES missing, if you do not care about it for this species, e.g.:

	create table work.species
	(species char(10),
	can_fly num,
	furry num,
	levitates num)
	;
	
	insert into work.species
	values('penguin', 0,0,0)
	values('cat', .,1,0)
	values('sparrow', 1,0,0)
	;

You should then be able to match like this:

	create table assigned_observations as
	select observations.*,species.species
	from observations left join species
	on (observations.furry=species.furry or species.furry is null) and
	   (observations.can_fly=species.can_fly or species.can_fly is null) and
	   (observations.levitates=species.levitates or species.levitates is null)
	   ;

If you only want the observations that can actually be assigned to a species, just leave out the "left" in the final SQL.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 416 views
  • 0 likes
  • 2 in conversation