BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have a dataset which looks like the following:

 


DATA have;
input individual investment_year firm individual_exit_year;
DATALINES;
1	2006	1004	.
1	2006	1005	.
1	2006	1006	.
1	2007	1004	.
1	2007	1005	.
1	2007	1007	.
1	2010	1003	.
1	2010	1006	.
1	2010	1007	.
1	2015	1004	.
1	2015	1005	.
1	2015	1007	.
2	2005	1003	2021
2	2005	1006	2021
2	2005	1008	2021
2	2008	1004	2021
2	2008	1005	2021
2	2008	1006	2021
2	2013	1005	2021
2	2013	1006	2021
2	2013	1008	2021
2	2019	1003	2021
2	2019	1004	2021
2	2019	1005	2021
3	2010	1003	2010
;
RUN;

I want to create the following dataset based on the above:

 


DATA want;
input firm year individual treatment;
DATALINES;
1003	2005	1	0
1003	2006	1	0
1003	2007	1	0
1003	2008	1	0
1003	2009	1	0
1003	2010	1	1
1003	2011	1	1
1003	2012	1	1
1003	2013	1	1
1003	2014	1	1
1003	2015	1	0
1003	2016	1	0
1003	2017	1	0
1003	2018	1	0
1003	2019	1	0
1003	2020	1	0
1003	2021	1	0
1003	2022	1	0
1003	2023	1	0
1004	2005	1	0
1004	2006	1	1
1004	2007	1	1
1004	2008	1	1
1004	2009	1	1
1004	2010	1	0
1004	2011	1	0
1004	2012	1	0
1004	2013	1	0
1004	2014	1	0
1004	2015	1	1
1004	2016	1	1
1004	2017	1	1
1004	2018	1	1
1004	2019	1	1
1004	2020	1	1
1004	2021	1	1
1004	2022	1	1
1004	2023	1	1
1005	2005	1	0
1005	2006	1	1
1005	2007	1	1
1005	2008	1	1
1005	2009	1	1
1005	2010	1	0
1005	2011	1	0
1005	2012	1	0
1005	2013	1	0
1005	2014	1	0
1005	2015	1	1
1005	2016	1	1
1005	2017	1	1
1005	2018	1	1
1005	2019	1	1
1005	2020	1	1
1005	2021	1	1
1005	2022	1	1
1005	2023	1	1
1006	2005	1	0
1006	2006	1	1
1006	2007	1	0
1006	2008	1	0
1006	2009	1	0
1006	2010	1	1
1006	2011	1	1
1006	2012	1	1
1006	2013	1	1
1006	2014	1	1
1006	2015	1	0
1006	2016	1	0
1006	2017	1	0
1006	2018	1	0
1006	2019	1	0
1006	2020	1	0
1006	2021	1	0
1006	2022	1	0
1006	2023	1	0
1007	2005	1	0
1007	2006	1	0
1007	2007	1	1
1007	2008	1	1
1007	2009	1	1
1007	2010	1	1
1007	2011	1	1
1007	2012	1	1
1007	2013	1	1
1007	2014	1	1
1007	2015	1	1
1007	2016	1	1
1007	2017	1	1
1007	2018	1	1
1007	2019	1	1
1007	2020	1	1
1007	2021	1	1
1007	2022	1	1
1007	2023	1	1
1003	2005	2	1
1003	2006	2	1
1003	2007	2	1
1003	2008	2	0
1003	2009	2	0
1003	2010	2	0
1003	2011	2	0
1003	2012	2	0
1003	2013	2	0
1003	2014	2	0
1003	2015	2	0
1003	2016	2	0
1003	2017	2	0
1003	2018	2	0
1003	2019	2	1
1003	2020	2	1
1003	2021	2	0
1003	2022	2	0
1003	2023	2	0
1004	2005	2	0
1004	2006	2	0
1004	2007	2	0
1004	2008	2	1
1004	2009	2	1
1004	2010	2	1
1004	2011	2	1
1004	2012	2	1
1004	2013	2	0
1004	2014	2	0
1004	2015	2	0
1004	2016	2	0
1004	2017	2	0
1004	2018	2	0
1004	2019	2	1
1004	2020	2	1
1004	2021	2	0
1004	2022	2	0
1004	2023	2	0
1005	2005	2	0
1005	2006	2	0
1005	2007	2	0
1005	2008	2	1
1005	2009	2	1
1005	2010	2	1
1005	2011	2	1
1005	2012	2	1
1005	2013	2	1
1005	2014	2	1
1005	2015	2	1
1005	2016	2	1
1005	2017	2	1
1005	2018	2	1
1005	2019	2	1
1005	2020	2	1
1005	2021	2	0
1005	2022	2	0
1005	2023	2	0
1006	2005	2	1
1006	2006	2	1
1006	2007	2	1
1006	2008	2	1
1006	2009	2	1
1006	2010	2	1
1006	2011	2	1
1006	2012	2	1
1006	2013	2	1
1006	2014	2	1
1006	2015	2	1
1006	2016	2	1
1006	2017	2	1
1006	2018	2	1
1006	2019	2	0
1006	2020	2	0
1006	2021	2	0
1006	2022	2	0
1006	2023	2	0
1008	2005	2	1
1008	2006	2	1
1008	2007	2	1
1008	2008	2	0
1008	2009	2	0
1008	2010	2	0
1008	2011	2	0
1008	2012	2	0
1008	2013	2	1
1008	2014	2	1
1008	2015	2	1
1008	2016	2	1
1008	2017	2	1
1008	2018	2	1
1008	2019	2	0
1008	2020	2	0
1008	2021	2	0
1008	2022	2	0
1008	2023	2	0
1003	2005	3	0
1003	2006	3	0
1003	2007	3	0
1003	2008	3	0
1003	2009	3	0
1003	2010	3	0
1003	2011	3	0
1003	2012	3	0
1003	2013	3	0
1003	2014	3	0
1003	2015	3	0
1003	2016	3	0
1003	2017	3	0
1003	2018	3	0
1003	2019	3	0
1003	2020	3	0
1003	2021	3	0
1003	2022	3	0
1003	2023	3	0
;
RUN;

I want to create a timeseries of a new variable called "treatment" for each firm from 2005 to 2023 based on the investment_year for each individual in the "have" dataset.

 

The rules to create this "treatment" variable is quite simple, but best illustrated using an example. Consider firm=1004 and individual=1 in "have". This individual invests in this firm in 2006 (i.e., in investment_year=2006, we see firm=1004 appears for individual=1), the individual invests in this firm again in 2007 (i.e., firm=1004 appears again for individual 1 when investment_year=2007), does NOT invest in it in 2010 (i.e., firm=1004 does NOT appear for individual 1 in investment_year=2010), and then invests in it again in 2015 (i.e., firm=1004 appears again for individual 1 when investment_year=2015). 

 

The "treatment" variable in "want"is a binary variable that takes either the value of 0 or 1 based on when the individual invests in the given firm. So, for firm=1004 and individual=1, I want to create a timeseries from 2005 to 2023 where "treatment" starts with a value of 0 in 2005 but takes on a value of 1 from 2006 (because individual 1 invests in this firm), it stays on as 1 until the next round of investment, which is in 2007, it will stay as 1 from 2007 to 2010, but will switch back to 0 from 2010 since the individual does not invest in this firm from 2010, but it will switch back to 1 from 2015 onwards since the individual invests in this firm again. Note that if individual_exit_year is empty, then the value of treatment stays from the last investment_year until 2023. In this case, because individual 1 invests in firm=1004 in investment_year=2015 and this individual has an empty individual_exit_year, treatment will equal to 1 from 2015 all the way until 2023.

 

However, if individual_exit_year is NOT empty, then the value of treatment will equal to 0 from whenever the year in individual_exit_year is. As an example, consider firm=1003 and individual=2. This individual invests in this firm in 2005, so treatment will take a value of 1 from 2005 until 2008 when it will switch to 0 because the individual does NOT invest in this firm in investment_year=2008. Treatment will keep maintaining a value of 0 through to 2013 (since the individual still does NOT invest in this firm in investment_year=2013). But, treatment will take a value of 1 from 2019 (since the individual invests in this firm in 2019), but because individual_exit_year=2021 for individual 2, treatment will only take a value of 1 until 2021, where it will switch back to 0 until 2023. Note that if investment_year and individual_exit_year are the same (as is the case for individual=3), the individual_exit_year takes precedence, i.e., treatment will take a value of 0 until 2023.

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
elbarto
Obsidian | Level 7

Thank you! I managed to tweak your code to get it 🙂

 


data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

proc sort data=have out=temp2(keep=individual investment_year) nodupkey;
by individual investment_year;
run;




data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;

    create table base3
        as select distinct a.*, b.investment_year, c.investment_year as inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
				left join temp2 c
					on a.individual=c.individual 
					and a.year=c.investment_year
                order by firm, individual, year
    ;
quit;


data base3;
set base3;

if investment_year^=. then do;
	if investment_year = inv_year then treat=1;
end;

if inv_year^=. and investment_year=. then treat=0;


run;


data base3;
  set base3;
  by firm individual;
  if first.individual then treat_fill=treat;
  else treat_fill=coalesce(treat,treat_fill); /** Note order matters treat, treat_fill **/
  retain treat_fill;
run;

data base3;
set base3;
if treat_fill=. then treat_fill=0;

if individual_exit_year^=. then do;
	if year>=individual_exit_year then treat_fill=0;
end;


run;


View solution in original post

2 REPLIES 2
rudfaden
Lapis Lazuli | Level 10

I can't get your data to match with your description of the rules. But this should get you started.

data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;
    create table temp2
        as select distinct investment_year as inv_year 
            from have
    ;
    create table base3
        as select distinct a.*, b.investment_year, c.inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
                left join temp2 c
                    on a.year=c.inv_year
                order by individual,year, firm 
    ;
quit;

data want;
    set base3;
    retain treatment;
    by individual year firm;

    if first.indiviudal then
        treatment=0;

    if investment_year=inv_year and investment_year ne . then
        treatment=1;
    else if investment_year=. and inv_year ne . then treatment=0;
run;
elbarto
Obsidian | Level 7

Thank you! I managed to tweak your code to get it 🙂

 


data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

proc sort data=have out=temp2(keep=individual investment_year) nodupkey;
by individual investment_year;
run;




data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;

    create table base3
        as select distinct a.*, b.investment_year, c.investment_year as inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
				left join temp2 c
					on a.individual=c.individual 
					and a.year=c.investment_year
                order by firm, individual, year
    ;
quit;


data base3;
set base3;

if investment_year^=. then do;
	if investment_year = inv_year then treat=1;
end;

if inv_year^=. and investment_year=. then treat=0;


run;


data base3;
  set base3;
  by firm individual;
  if first.individual then treat_fill=treat;
  else treat_fill=coalesce(treat,treat_fill); /** Note order matters treat, treat_fill **/
  retain treat_fill;
run;

data base3;
set base3;
if treat_fill=. then treat_fill=0;

if individual_exit_year^=. then do;
	if year>=individual_exit_year then treat_fill=0;
end;


run;


sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 711 views
  • 0 likes
  • 2 in conversation