BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have a dataset which looks like the following:

 


DATA have;
input individual investment_year firm individual_exit_year;
DATALINES;
1	2006	1004	.
1	2006	1005	.
1	2006	1006	.
1	2007	1004	.
1	2007	1005	.
1	2007	1007	.
1	2010	1003	.
1	2010	1006	.
1	2010	1007	.
1	2015	1004	.
1	2015	1005	.
1	2015	1007	.
2	2005	1003	2021
2	2005	1006	2021
2	2005	1008	2021
2	2008	1004	2021
2	2008	1005	2021
2	2008	1006	2021
2	2013	1005	2021
2	2013	1006	2021
2	2013	1008	2021
2	2019	1003	2021
2	2019	1004	2021
2	2019	1005	2021
3	2010	1003	2010
;
RUN;

I want to create the following dataset based on the above:

 


DATA want;
input firm year individual treatment;
DATALINES;
1003	2005	1	0
1003	2006	1	0
1003	2007	1	0
1003	2008	1	0
1003	2009	1	0
1003	2010	1	1
1003	2011	1	1
1003	2012	1	1
1003	2013	1	1
1003	2014	1	1
1003	2015	1	0
1003	2016	1	0
1003	2017	1	0
1003	2018	1	0
1003	2019	1	0
1003	2020	1	0
1003	2021	1	0
1003	2022	1	0
1003	2023	1	0
1004	2005	1	0
1004	2006	1	1
1004	2007	1	1
1004	2008	1	1
1004	2009	1	1
1004	2010	1	0
1004	2011	1	0
1004	2012	1	0
1004	2013	1	0
1004	2014	1	0
1004	2015	1	1
1004	2016	1	1
1004	2017	1	1
1004	2018	1	1
1004	2019	1	1
1004	2020	1	1
1004	2021	1	1
1004	2022	1	1
1004	2023	1	1
1005	2005	1	0
1005	2006	1	1
1005	2007	1	1
1005	2008	1	1
1005	2009	1	1
1005	2010	1	0
1005	2011	1	0
1005	2012	1	0
1005	2013	1	0
1005	2014	1	0
1005	2015	1	1
1005	2016	1	1
1005	2017	1	1
1005	2018	1	1
1005	2019	1	1
1005	2020	1	1
1005	2021	1	1
1005	2022	1	1
1005	2023	1	1
1006	2005	1	0
1006	2006	1	1
1006	2007	1	0
1006	2008	1	0
1006	2009	1	0
1006	2010	1	1
1006	2011	1	1
1006	2012	1	1
1006	2013	1	1
1006	2014	1	1
1006	2015	1	0
1006	2016	1	0
1006	2017	1	0
1006	2018	1	0
1006	2019	1	0
1006	2020	1	0
1006	2021	1	0
1006	2022	1	0
1006	2023	1	0
1007	2005	1	0
1007	2006	1	0
1007	2007	1	1
1007	2008	1	1
1007	2009	1	1
1007	2010	1	1
1007	2011	1	1
1007	2012	1	1
1007	2013	1	1
1007	2014	1	1
1007	2015	1	1
1007	2016	1	1
1007	2017	1	1
1007	2018	1	1
1007	2019	1	1
1007	2020	1	1
1007	2021	1	1
1007	2022	1	1
1007	2023	1	1
1003	2005	2	1
1003	2006	2	1
1003	2007	2	1
1003	2008	2	0
1003	2009	2	0
1003	2010	2	0
1003	2011	2	0
1003	2012	2	0
1003	2013	2	0
1003	2014	2	0
1003	2015	2	0
1003	2016	2	0
1003	2017	2	0
1003	2018	2	0
1003	2019	2	1
1003	2020	2	1
1003	2021	2	0
1003	2022	2	0
1003	2023	2	0
1004	2005	2	0
1004	2006	2	0
1004	2007	2	0
1004	2008	2	1
1004	2009	2	1
1004	2010	2	1
1004	2011	2	1
1004	2012	2	1
1004	2013	2	0
1004	2014	2	0
1004	2015	2	0
1004	2016	2	0
1004	2017	2	0
1004	2018	2	0
1004	2019	2	1
1004	2020	2	1
1004	2021	2	0
1004	2022	2	0
1004	2023	2	0
1005	2005	2	0
1005	2006	2	0
1005	2007	2	0
1005	2008	2	1
1005	2009	2	1
1005	2010	2	1
1005	2011	2	1
1005	2012	2	1
1005	2013	2	1
1005	2014	2	1
1005	2015	2	1
1005	2016	2	1
1005	2017	2	1
1005	2018	2	1
1005	2019	2	1
1005	2020	2	1
1005	2021	2	0
1005	2022	2	0
1005	2023	2	0
1006	2005	2	1
1006	2006	2	1
1006	2007	2	1
1006	2008	2	1
1006	2009	2	1
1006	2010	2	1
1006	2011	2	1
1006	2012	2	1
1006	2013	2	1
1006	2014	2	1
1006	2015	2	1
1006	2016	2	1
1006	2017	2	1
1006	2018	2	1
1006	2019	2	0
1006	2020	2	0
1006	2021	2	0
1006	2022	2	0
1006	2023	2	0
1008	2005	2	1
1008	2006	2	1
1008	2007	2	1
1008	2008	2	0
1008	2009	2	0
1008	2010	2	0
1008	2011	2	0
1008	2012	2	0
1008	2013	2	1
1008	2014	2	1
1008	2015	2	1
1008	2016	2	1
1008	2017	2	1
1008	2018	2	1
1008	2019	2	0
1008	2020	2	0
1008	2021	2	0
1008	2022	2	0
1008	2023	2	0
1003	2005	3	0
1003	2006	3	0
1003	2007	3	0
1003	2008	3	0
1003	2009	3	0
1003	2010	3	0
1003	2011	3	0
1003	2012	3	0
1003	2013	3	0
1003	2014	3	0
1003	2015	3	0
1003	2016	3	0
1003	2017	3	0
1003	2018	3	0
1003	2019	3	0
1003	2020	3	0
1003	2021	3	0
1003	2022	3	0
1003	2023	3	0
;
RUN;

I want to create a timeseries of a new variable called "treatment" for each firm from 2005 to 2023 based on the investment_year for each individual in the "have" dataset.

 

The rules to create this "treatment" variable is quite simple, but best illustrated using an example. Consider firm=1004 and individual=1 in "have". This individual invests in this firm in 2006 (i.e., in investment_year=2006, we see firm=1004 appears for individual=1), the individual invests in this firm again in 2007 (i.e., firm=1004 appears again for individual 1 when investment_year=2007), does NOT invest in it in 2010 (i.e., firm=1004 does NOT appear for individual 1 in investment_year=2010), and then invests in it again in 2015 (i.e., firm=1004 appears again for individual 1 when investment_year=2015). 

 

The "treatment" variable in "want"is a binary variable that takes either the value of 0 or 1 based on when the individual invests in the given firm. So, for firm=1004 and individual=1, I want to create a timeseries from 2005 to 2023 where "treatment" starts with a value of 0 in 2005 but takes on a value of 1 from 2006 (because individual 1 invests in this firm), it stays on as 1 until the next round of investment, which is in 2007, it will stay as 1 from 2007 to 2010, but will switch back to 0 from 2010 since the individual does not invest in this firm from 2010, but it will switch back to 1 from 2015 onwards since the individual invests in this firm again. Note that if individual_exit_year is empty, then the value of treatment stays from the last investment_year until 2023. In this case, because individual 1 invests in firm=1004 in investment_year=2015 and this individual has an empty individual_exit_year, treatment will equal to 1 from 2015 all the way until 2023.

 

However, if individual_exit_year is NOT empty, then the value of treatment will equal to 0 from whenever the year in individual_exit_year is. As an example, consider firm=1003 and individual=2. This individual invests in this firm in 2005, so treatment will take a value of 1 from 2005 until 2008 when it will switch to 0 because the individual does NOT invest in this firm in investment_year=2008. Treatment will keep maintaining a value of 0 through to 2013 (since the individual still does NOT invest in this firm in investment_year=2013). But, treatment will take a value of 1 from 2019 (since the individual invests in this firm in 2019), but because individual_exit_year=2021 for individual 2, treatment will only take a value of 1 until 2021, where it will switch back to 0 until 2023. Note that if investment_year and individual_exit_year are the same (as is the case for individual=3), the individual_exit_year takes precedence, i.e., treatment will take a value of 0 until 2023.

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
elbarto
Obsidian | Level 7

Thank you! I managed to tweak your code to get it 🙂

 


data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

proc sort data=have out=temp2(keep=individual investment_year) nodupkey;
by individual investment_year;
run;




data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;

    create table base3
        as select distinct a.*, b.investment_year, c.investment_year as inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
				left join temp2 c
					on a.individual=c.individual 
					and a.year=c.investment_year
                order by firm, individual, year
    ;
quit;


data base3;
set base3;

if investment_year^=. then do;
	if investment_year = inv_year then treat=1;
end;

if inv_year^=. and investment_year=. then treat=0;


run;


data base3;
  set base3;
  by firm individual;
  if first.individual then treat_fill=treat;
  else treat_fill=coalesce(treat,treat_fill); /** Note order matters treat, treat_fill **/
  retain treat_fill;
run;

data base3;
set base3;
if treat_fill=. then treat_fill=0;

if individual_exit_year^=. then do;
	if year>=individual_exit_year then treat_fill=0;
end;


run;


View solution in original post

2 REPLIES 2
rudfaden
Lapis Lazuli | Level 10

I can't get your data to match with your description of the rules. But this should get you started.

data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;
    create table temp2
        as select distinct investment_year as inv_year 
            from have
    ;
    create table base3
        as select distinct a.*, b.investment_year, c.inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
                left join temp2 c
                    on a.year=c.inv_year
                order by individual,year, firm 
    ;
quit;

data want;
    set base3;
    retain treatment;
    by individual year firm;

    if first.indiviudal then
        treatment=0;

    if investment_year=inv_year and investment_year ne . then
        treatment=1;
    else if investment_year=. and inv_year ne . then treatment=0;
run;
elbarto
Obsidian | Level 7

Thank you! I managed to tweak your code to get it 🙂

 


data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

proc sort data=have out=temp2(keep=individual investment_year) nodupkey;
by individual investment_year;
run;




data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;

    create table base3
        as select distinct a.*, b.investment_year, c.investment_year as inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
				left join temp2 c
					on a.individual=c.individual 
					and a.year=c.investment_year
                order by firm, individual, year
    ;
quit;


data base3;
set base3;

if investment_year^=. then do;
	if investment_year = inv_year then treat=1;
end;

if inv_year^=. and investment_year=. then treat=0;


run;


data base3;
  set base3;
  by firm individual;
  if first.individual then treat_fill=treat;
  else treat_fill=coalesce(treat,treat_fill); /** Note order matters treat, treat_fill **/
  retain treat_fill;
run;

data base3;
set base3;
if treat_fill=. then treat_fill=0;

if individual_exit_year^=. then do;
	if year>=individual_exit_year then treat_fill=0;
end;


run;


SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 545 views
  • 0 likes
  • 2 in conversation