Hi guys,
I am having a dataset containing firm id, year and audiit_year, and I want to create a dummy (new column) where year is equal to audit_year.
FirmID Year Audit_Year Dummy(what I want)
11 1990 . 0
11 1991 . 1
11 1992 1991 0
11 1993 . 1
11 1994 1993 0
11 1995 . 0
11 1996 . 0
12 2001 . 1
12 2002 2001 0
12 2003 . 0
12 2004 . 0
12 2005 . 0
12 2006 . 0
12 2007 . 0
12 2008 . 0
12 2009 . 0
Basically, I want to create a new dummy variable, where dummy=1 for the year when it is equal to the audit year (by firm ID).
Regards,
Amanjot
data have;
input FirmID year Audit_Year;
cards;
11 1990 . 0
11 1991 . 1
11 1992 1991 0
11 1993 . 1
11 1994 1993 0
11 1995 . 0
11 1996 . 0
12 2001 . 1
12 2002 2001 0
12 2003 . 0
12 2004 . 0
12 2005 . 0
12 2006 . 0
12 2007 . 0
12 2008 . 0
12 2009 . 0
;
run;
data want;
do _n_=1 by 1 until(last.FirmID | ^ missing(Audit_Year));
set have;
by firmid notsorted;
end;
year_=Audit_Year;
do _n_=1 to _n_;
set have;
by firmid;
if year=year_ then Dummy=1;
else Dummy=0;
output;
end;
drop year_;
run;
Hi @amanjot_42 Plain and straight forward SQL
data have;
input FirmID Year Audit_Year;
cards;
11 1990 . 0
11 1991 . 1
11 1992 1991 0
11 1993 . 1
11 1994 1993 0
11 1995 . 0
11 1996 . 0
12 2001 . 1
12 2002 2001 0
12 2003 . 0
12 2004 . 0
12 2005 . 0
12 2006 . 0
12 2007 . 0
12 2008 . 0
12 2009 . 0
;
proc sql;
create table want as
select a.* ,a.year=b.audit_year as dummy
from have a left join have b
on a.firmid=b.firmid and a.year=b.audit_year
order by a.firmid, a.year;
quit;
/*Or a simple HASH table look up*/
data want ;
if _n_=1 then do;
dcl hash H (dataset:'have(where=(audit_year^=.))') ;
h.definekey ("firmid","audit_year") ;
h.definedone () ;
end;
set have;
dummy=h.check()=0;
run;
/*Or a simple merge*/
data want;
merge have have(in=b drop= year rename=audit_year=year where=(year^=.));
by firmid year;
dummy=b;
run;
This problem is a version of mixing two time series (the series of all records, and the (sub)series of non-missing audit_year records).
data want;
merge have
have (keep=firmid audit_year rename=(audit_year=year) where=(year^=.) in=inaudit);
by firmid year;
dummy=inaudit;
run;
And you can strike out everything I said earlier below - overcomplicating the problem.
In this particular mix you want to
data have; input FirmID Year Audit_Year; cards; 11 1990 . 11 1991 . 11 1992 1991 11 1993 . 11 1994 1993 11 1995 . 11 1996 . 12 2001 . 12 2002 2001 12 2003 . 12 2004 . 12 2005 . 12 2006 . 12 2007 . 12 2008 . 12 2009 . ; data want (drop=_:); if eoh1=0 then set have (where=(audit_year^=.) rename=(firmid=_firmid)) end=eoh1; else call missing(audit_year); _yr=audit_year; do until (end_of_have); set have end=end_of_have; dummy=ifn(year=_yr and firmid=_firmid,1,0); output; if dummy=1 then leave; end; run;
The "if eoh1 then SET …" statement at the beginning is done to avoid premature stopping of the first SET (and therefore the DATA step), before the second SET has processed all the data.
data have;
input FirmID year Audit_Year;
cards;
11 1990 . 0
11 1991 . 1
11 1992 1991 0
11 1993 . 1
11 1994 1993 0
11 1995 . 0
11 1996 . 0
12 2001 . 1
12 2002 2001 0
12 2003 . 0
12 2004 . 0
12 2005 . 0
12 2006 . 0
12 2007 . 0
12 2008 . 0
12 2009 . 0
;
run;
data want;
do _n_=1 by 1 until(last.FirmID | ^ missing(Audit_Year));
set have;
by firmid notsorted;
end;
year_=Audit_Year;
do _n_=1 to _n_;
set have;
by firmid;
if year=year_ then Dummy=1;
else Dummy=0;
output;
end;
drop year_;
run;
Hi,
Thanks for the help!
I have noticed, there is another column of audit year which looks like this:
Firm ID Year Audit_year Dummy (what I want)
11 1990 . 0
11 1991 . 0
11 1992 1993 0
11 1993 . 1
11 1994 1995 0
11 1995 . 1
11 1996 . 0
12 2001 . 0
12 2002 2003 0
12 2003 . 1
12 2004 . 0
12 2005 . 0
12 2006 . 0
12 2007 . 0
12 2008 . 0
12 2009 . 0
In the latter table, audit years are one year ahead of YEAR. Please help with respect to the codes
Regards,
Amanjot
s
Hi @amanjot_42
I think the solutions posted by @mkeintz and @novinosrin will handle both the scenarios for you . Feel free to change the answer if you think it is appropriate.
Personally I like the below solution.
data want;
merge have
have (keep=firmid audit_year rename=(audit_year=year) where=(year^=.) in=inaudit);
by firmid year;
dummy=inaudit;
run;
Hi,
Thanks for getting back,
The latter procedures are not working as expected (@mkeintz and @novinosrin). They are increasing the number of observations (a lot of duplicated records are getting generated). The codes provided by you worked perfectly well with respect to the earlier scenario. Can you please help with the codes for the recent scenario.
Regards,
Amanjot
Let me know if this works :
data have;
input FirmID year Audit_Year;
cards;
11 1990 .
11 1991 .
11 1992 1993
11 1993 .
11 1994 1995
11 1995 .
11 1996 .
12 2001 .
12 2002 2003
12 2003 .
12 2004 .
12 2005 .
12 2006 .
12 2007 .
12 2008 .
12 2009 .
;
run;
data want;
do _n_=1 by 1 until(last.FirmID | brk );
set have;
by firmid notsorted;
if ^missing(Audit_Year) then year_=Audit_Year;
if year_=year then brk=1;
end;
do _n_=1 to _n_;
set have;
by firmid;
if year_=year then Dummy=1;
else Dummy=0;
output;
end;
drop year_ brk;
run;
Hi,
Thank you so much,
It worked really well!
I have one more question - for my another analysis (with a similar dataset - but having multiple observations for a particular year, say 3 rows for the year 2006. So, in this case, if I run the same codes for the audit year dummies, then it put 1 for only the first row of 2006. Rest of the two rows are depicting 0.
Please help in this regard!
Regards,
Cheers
Please Open a new thread and post some sample data.
Thanks,
I will do that!
Regards
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.