I have a table with observations and I want to assign those to specific categories based on rules.
My sample data is e.g. the following:
create table work.observations
(id char(2),
can_fly num,
furry num,
levitates num)
;
insert into work.observations
values('AB', 0, 0, 0)
values('AC', 1, 0, 0)
values('AD', 0, 1, 0)
values('BC', ., 1, 0)
values('BD', ., 0, 0)
;
Now I want to assign the species.
For this example, assume there are three possibilities, a penguin, sparrow, or a cat. If the animal is furry, I want to assign "cat". If it doesn't, it's a bird, which one is determined by whether or not it can fly. If I already know it's a cat, I don't care about the value of the "can_fly" column.
I could do this for one species at a time in the following way:
%let species = cat;
%let furry = (1);
%let can_fly = (0, 1, .);
%let levitates = (0);
proc sql;
create table work.assigned_observations as
select distinct
o.id,
o.can_fly,
o.furry,
o.levitates,
case when o.can_fly in &can_fly.
and o.furry in &furry.
and o.levitates in &levitates.
then "&species." end as species
from work.observations o
;
quit;run;
However, I want to do this for all at once. I was thinking of writing the possible settings to a file (e.g. excel), then loading this into a table and perform a join. However, I haven't been able to get the possibilities as lists as in the above example, so the best I could get working includes string comparisons, which is inefficient. See the following:
proc sql;
create table work.species
(species char(10),
can_fly char(10),
furry char(10),
levitates char(10))
;
insert into work.species
values('penguin', '(0)', '(0)', '(0)')
values('cat', '(0, 1, .)', '(1)', '(0)')
values('sparrow', '(1)', '(0)', '(0)')
;
title 'Different species'
select * from work.species
proc printto; run;
proc sql;
create table work.assigned_observations as
select distinct
o.id,
o.can_fly,
o.furry,
o.levitates,
s.species
from work.observations o
left join work.species s
on s.can_fly like catx('', '%', put(o.can_fly, 11.), '%')
and s.furry like catx('', '%', put(o.furry, 11.), '%')
and s.levitates like catx('', '%', put(o.levitates, 11.), '%')
;
quit;run;
How can I improve this, while keeping the flexibility of having multiple options and keeping the input file (with the settings for the different species) readable?
A simpler way may be to keep a column in SPECIES missing, if you do not care about it for this species, e.g.:
create table work.species
(species char(10),
can_fly num,
furry num,
levitates num)
;
insert into work.species
values('penguin', 0,0,0)
values('cat', .,1,0)
values('sparrow', 1,0,0)
;
You should then be able to match like this:
create table assigned_observations as
select observations.*,species.species
from observations left join species
on (observations.furry=species.furry or species.furry is null) and
(observations.can_fly=species.can_fly or species.can_fly is null) and
(observations.levitates=species.levitates or species.levitates is null)
;
If you only want the observations that can actually be assigned to a species, just leave out the "left" in the final SQL.
A simpler way may be to keep a column in SPECIES missing, if you do not care about it for this species, e.g.:
create table work.species
(species char(10),
can_fly num,
furry num,
levitates num)
;
insert into work.species
values('penguin', 0,0,0)
values('cat', .,1,0)
values('sparrow', 1,0,0)
;
You should then be able to match like this:
create table assigned_observations as
select observations.*,species.species
from observations left join species
on (observations.furry=species.furry or species.furry is null) and
(observations.can_fly=species.can_fly or species.can_fly is null) and
(observations.levitates=species.levitates or species.levitates is null)
;
If you only want the observations that can actually be assigned to a species, just leave out the "left" in the final SQL.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.