Hello,
I would like to compare each year's observation with prior 3-year observations and find the unique value
This is what I have
DATA HAVE;
INPUT (FIRM YEAR VAR) (:$8.);
CARDS;
A 2000 dog
A 2001 dog
A 2001 cat
A 2002 dog
A 2002 bird
A 2003 dog
A 2003 chicken
A 2004 chicken
A 2004 mice
B 2000 fries
B 2001 burger
B 2001 steak
B 2002 burger
B 2002 fries
B 2003 steak
B 2003 pasta
B 2004 pasta
B 2004 cheese
;
run;So I compare each firm's each year VAR with prior three-year VAR, and find the unique value of VAR, and this is what I want
DATA WANT;
INPUT (FIRM YEAR VAR) (:$8.);
CARDS;
A 2003 chicken
A 2004 mice
B 2003 pasta
B 2004 cheese
;
run;I tried something like this:
proc sql;
select
b.*
from have a
RIGHT JOIN
have b
ON a.firm = b.firm and a.year = b.year-1 and a.VAR = b.VAR
WHERE a.VAR is null;
quit;
But this only compares the observation with the prior 1 year. How can I compare with all the prior three year observations?
Thank you
I don't fully understand. Why should the obs
A 2003 chicken
Be there. Please be specific 🙂
Yes,
The value 'Chicken' is the new one compared with the prior three-year values.
In 2003, VAR = dog, chicken
In the prior three years (2000-2002), VAR = dog, cat, bird
Compared with 2000-2002, VAR in 2003 has a new value 'chicken'.
Is last year the same for all farms? (2004 in the test data).
if positive then the you can run:
proc sql;
create table want as
select distinct * from have
where year between <last_year> - 2 and <last_year>;
quit;
If negative you need on first step to select the last year of each farm, add it to each observation and output only those which fits the requirement:
proc sql;
create table temp as
select farm, max(year) as last_year
from have
group by farm;
create table want as
select distinct h.* , t.last_year
from have as h
left join temp as t
on h.farm = t.farm and
h.year between t.last_year - 2 and t.last_year;
quit;
How to do this with correlated subqueries:
DATA HAVE;
INPUT FIRM :$8. YEAR VAR :$8.;
CARDS;
A 2000 dog
A 2001 dog
A 2001 cat
A 2002 dog
A 2002 bird
A 2003 dog
A 2003 chicken
A 2004 chicken
A 2004 mice
B 2000 fries
B 2001 burger
B 2001 steak
B 2002 burger
B 2002 fries
B 2003 steak
B 2003 pasta
B 2004 pasta
B 2004 cheese
;
proc sql;
select *
from have as a
where
not exists (
select * from have
where
firm=a.firm and
year between a.year-3 and a.year-1 and
var=a.var)
and
(select count(distinct year) from have
where
firm=a.firm and
year between a.year-3 and a.year-1) = 3
;
quit;
Thank you! I figured it out!
Your code is very instructive. But it takes a lot of time still cannot run the result
I modify my code based on your instruction
proc sql;
create table want as select
distinct b.firm, b.year, b.var
from have as a
right join
have as b
on a.firm = b.firm and a.var=b.var and a.year between b.year-3 and b.year-1
WHERE a.var is null;
QUIT;At least it gives me the result I look for.
Do you think my code is anywhere wrong?
Running your code, I get
i.e. it returns cases that don't have data in some of the previous 3 years. That's not the result you asked for in your original post. If that's ok then what you did is fine.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.