I have a data which has the following information:
Company year ADD OUT Change
A 2001 0 0 1
A 2002 0 0 1
A 2003 0 0 1
A 2004 1 0 1
A 2005 0 0 0
A 2006 0 1 0
A 2007 0 0 0
A 2008 0 0 1
B 2001 0 0 0
B 2001 0 0 1
B 2002 0 0 0
B 2003 1 0 1
B 2004 0 0 0
B 2005 0 0 0
B 2006 0 1 0
B 2007 0 0 0
B 2008 0 0 0
B 2009 0 0 1
I am trying to create two new variable NEW1 and NEW2. If ADD =1 and CHANGE =1 in the third year prior to ADD =1 then all the three years prior to ADD =1 will be 1 for NEW1. If CHANGE =1 in the second year prior year to ADD =1 then NEW1 = 1 for two years prior to ADD =1. If CHANGE =1 in the year prior to ADD = 1 then NEW1 = 1 for the year prior to ADD =1. Similarly, if OUT =1 and CHANGE =1 in the third year following OUT =1, then NEW2 =1 for all the three years following OUT=1. In other words, I am trying to create a dataset which would look like the following:
Company year ADD OUT Change NEW1 NEW2
A 2001 0 0 1 1 0
A 2002 0 0 1 1 0
A 2003 0 0 1 1 0
A 2004 1 0 1 0 0
A 2005 0 0 0 0 0
A 2006 0 1 0 0 0
A 2007 0 0 0 0 1
A 2008 0 0 1 0 1
A 2009 0 0 0 0 0
B 2000 0 0 0 0 0
B 2001 0 0 1 1 0
B 2002 0 0 0 1 0
B 2003 1 0 1 0 0
B 2004 0 0 0 0 0
B 2005 0 0 0 0 0
B 2006 0 1 0 0 0
B 2007 0 0 0 0 1
B 2008 0 0 0 0 1
B 2009 0 0 1 0 1
Any suggessions will be helpful.
SQL solution :
data have;
input Company $ year ADD OUT Change;
datalines;
A 2001 0 0 1
A 2002 0 0 1
A 2003 0 0 1
A 2004 1 0 1
A 2005 0 0 0
A 2006 0 1 0
A 2007 0 0 0
A 2008 0 0 1
B 2000 0 0 0
B 2001 0 0 1
B 2002 0 0 0
B 2003 1 0 1
B 2004 0 0 0
B 2005 0 0 0
B 2006 0 1 0
B 2007 0 0 0
B 2008 0 0 0
B 2009 0 0 1
;
proc sql;
create table ADD as
select a.company, a.year-1 as ADDyear, b.year as changeYear
from have as a inner join have as b
on a.company=b.company and a.ADD and b.Change and
b.year between a.year-3 and a.year-1;
create table OUT as
select a.company, a.year+1 as OUTyear, b.year as changeYear
from have as a inner join have as b
on a.company=b.company and a.OUT and b.Change and
a.year between b.year-3 and b.year-1;
create table want as
select H.*,
exists (select * from ADD where company=H.company and
H.year between changeYear and ADDyear) as NEW1,
exists (select * from OUT where company=H.company and
H.year between OUTyear and changeYear) as NEW2
from have as H;
drop table ADD, OUT;
select * from want;
quit;
PG
SQL solution :
data have;
input Company $ year ADD OUT Change;
datalines;
A 2001 0 0 1
A 2002 0 0 1
A 2003 0 0 1
A 2004 1 0 1
A 2005 0 0 0
A 2006 0 1 0
A 2007 0 0 0
A 2008 0 0 1
B 2000 0 0 0
B 2001 0 0 1
B 2002 0 0 0
B 2003 1 0 1
B 2004 0 0 0
B 2005 0 0 0
B 2006 0 1 0
B 2007 0 0 0
B 2008 0 0 0
B 2009 0 0 1
;
proc sql;
create table ADD as
select a.company, a.year-1 as ADDyear, b.year as changeYear
from have as a inner join have as b
on a.company=b.company and a.ADD and b.Change and
b.year between a.year-3 and a.year-1;
create table OUT as
select a.company, a.year+1 as OUTyear, b.year as changeYear
from have as a inner join have as b
on a.company=b.company and a.OUT and b.Change and
a.year between b.year-3 and b.year-1;
create table want as
select H.*,
exists (select * from ADD where company=H.company and
H.year between changeYear and ADDyear) as NEW1,
exists (select * from OUT where company=H.company and
H.year between OUTyear and changeYear) as NEW2
from have as H;
drop table ADD, OUT;
select * from want;
quit;
PG
Thank you PGstats for the solution.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.