Hello,
I would like to compare each year's observation with prior 3-year observations and find the unique value
This is what I have
DATA HAVE;
INPUT (FIRM YEAR VAR) (:$8.);
CARDS;
A 2000 dog
A 2001 dog
A 2001 cat
A 2002 dog
A 2002 bird
A 2003 dog
A 2003 chicken
A 2004 chicken
A 2004 mice
B 2000 fries
B 2001 burger
B 2001 steak
B 2002 burger
B 2002 fries
B 2003 steak
B 2003 pasta
B 2004 pasta
B 2004 cheese
;
run;
So I compare each firm's each year VAR with prior three-year VAR, and find the unique value of VAR, and this is what I want
DATA WANT;
INPUT (FIRM YEAR VAR) (:$8.);
CARDS;
A 2003 chicken
A 2004 mice
B 2003 pasta
B 2004 cheese
;
run;
I tried something like this:
proc sql;
select
b.*
from have a
RIGHT JOIN
have b
ON a.firm = b.firm and a.year = b.year-1 and a.VAR = b.VAR
WHERE a.VAR is null;
quit;
But this only compares the observation with the prior 1 year. How can I compare with all the prior three year observations?
Thank you
I don't fully understand. Why should the obs
A 2003 chicken
Be there. Please be specific 🙂
Yes,
The value 'Chicken' is the new one compared with the prior three-year values.
In 2003, VAR = dog, chicken
In the prior three years (2000-2002), VAR = dog, cat, bird
Compared with 2000-2002, VAR in 2003 has a new value 'chicken'.
Is last year the same for all farms? (2004 in the test data).
if positive then the you can run:
proc sql;
create table want as
select distinct * from have
where year between <last_year> - 2 and <last_year>;
quit;
If negative you need on first step to select the last year of each farm, add it to each observation and output only those which fits the requirement:
proc sql;
create table temp as
select farm, max(year) as last_year
from have
group by farm;
create table want as
select distinct h.* , t.last_year
from have as h
left join temp as t
on h.farm = t.farm and
h.year between t.last_year - 2 and t.last_year;
quit;
How to do this with correlated subqueries:
DATA HAVE;
INPUT FIRM :$8. YEAR VAR :$8.;
CARDS;
A 2000 dog
A 2001 dog
A 2001 cat
A 2002 dog
A 2002 bird
A 2003 dog
A 2003 chicken
A 2004 chicken
A 2004 mice
B 2000 fries
B 2001 burger
B 2001 steak
B 2002 burger
B 2002 fries
B 2003 steak
B 2003 pasta
B 2004 pasta
B 2004 cheese
;
proc sql;
select *
from have as a
where
not exists (
select * from have
where
firm=a.firm and
year between a.year-3 and a.year-1 and
var=a.var)
and
(select count(distinct year) from have
where
firm=a.firm and
year between a.year-3 and a.year-1) = 3
;
quit;
Thank you! I figured it out!
Your code is very instructive. But it takes a lot of time still cannot run the result
I modify my code based on your instruction
proc sql;
create table want as select
distinct b.firm, b.year, b.var
from have as a
right join
have as b
on a.firm = b.firm and a.var=b.var and a.year between b.year-3 and b.year-1
WHERE a.var is null;
QUIT;
At least it gives me the result I look for.
Do you think my code is anywhere wrong?
Running your code, I get
i.e. it returns cases that don't have data in some of the previous 3 years. That's not the result you asked for in your original post. If that's ok then what you did is fine.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.