Solved: Creating a timeseries based on information in a dataset

elbarto · Posted 10-11-2023 01:34 PM

I have a dataset which looks like the following:


DATA have;
input individual investment_year firm individual_exit_year;
DATALINES;
1	2006	1004	.
1	2006	1005	.
1	2006	1006	.
1	2007	1004	.
1	2007	1005	.
1	2007	1007	.
1	2010	1003	.
1	2010	1006	.
1	2010	1007	.
1	2015	1004	.
1	2015	1005	.
1	2015	1007	.
2	2005	1003	2021
2	2005	1006	2021
2	2005	1008	2021
2	2008	1004	2021
2	2008	1005	2021
2	2008	1006	2021
2	2013	1005	2021
2	2013	1006	2021
2	2013	1008	2021
2	2019	1003	2021
2	2019	1004	2021
2	2019	1005	2021
3	2010	1003	2010
;
RUN;

I want to create the following dataset based on the above:


DATA want;
input firm year individual treatment;
DATALINES;
1003	2005	1	0
1003	2006	1	0
1003	2007	1	0
1003	2008	1	0
1003	2009	1	0
1003	2010	1	1
1003	2011	1	1
1003	2012	1	1
1003	2013	1	1
1003	2014	1	1
1003	2015	1	0
1003	2016	1	0
1003	2017	1	0
1003	2018	1	0
1003	2019	1	0
1003	2020	1	0
1003	2021	1	0
1003	2022	1	0
1003	2023	1	0
1004	2005	1	0
1004	2006	1	1
1004	2007	1	1
1004	2008	1	1
1004	2009	1	1
1004	2010	1	0
1004	2011	1	0
1004	2012	1	0
1004	2013	1	0
1004	2014	1	0
1004	2015	1	1
1004	2016	1	1
1004	2017	1	1
1004	2018	1	1
1004	2019	1	1
1004	2020	1	1
1004	2021	1	1
1004	2022	1	1
1004	2023	1	1
1005	2005	1	0
1005	2006	1	1
1005	2007	1	1
1005	2008	1	1
1005	2009	1	1
1005	2010	1	0
1005	2011	1	0
1005	2012	1	0
1005	2013	1	0
1005	2014	1	0
1005	2015	1	1
1005	2016	1	1
1005	2017	1	1
1005	2018	1	1
1005	2019	1	1
1005	2020	1	1
1005	2021	1	1
1005	2022	1	1
1005	2023	1	1
1006	2005	1	0
1006	2006	1	1
1006	2007	1	0
1006	2008	1	0
1006	2009	1	0
1006	2010	1	1
1006	2011	1	1
1006	2012	1	1
1006	2013	1	1
1006	2014	1	1
1006	2015	1	0
1006	2016	1	0
1006	2017	1	0
1006	2018	1	0
1006	2019	1	0
1006	2020	1	0
1006	2021	1	0
1006	2022	1	0
1006	2023	1	0
1007	2005	1	0
1007	2006	1	0
1007	2007	1	1
1007	2008	1	1
1007	2009	1	1
1007	2010	1	1
1007	2011	1	1
1007	2012	1	1
1007	2013	1	1
1007	2014	1	1
1007	2015	1	1
1007	2016	1	1
1007	2017	1	1
1007	2018	1	1
1007	2019	1	1
1007	2020	1	1
1007	2021	1	1
1007	2022	1	1
1007	2023	1	1
1003	2005	2	1
1003	2006	2	1
1003	2007	2	1
1003	2008	2	0
1003	2009	2	0
1003	2010	2	0
1003	2011	2	0
1003	2012	2	0
1003	2013	2	0
1003	2014	2	0
1003	2015	2	0
1003	2016	2	0
1003	2017	2	0
1003	2018	2	0
1003	2019	2	1
1003	2020	2	1
1003	2021	2	0
1003	2022	2	0
1003	2023	2	0
1004	2005	2	0
1004	2006	2	0
1004	2007	2	0
1004	2008	2	1
1004	2009	2	1
1004	2010	2	1
1004	2011	2	1
1004	2012	2	1
1004	2013	2	0
1004	2014	2	0
1004	2015	2	0
1004	2016	2	0
1004	2017	2	0
1004	2018	2	0
1004	2019	2	1
1004	2020	2	1
1004	2021	2	0
1004	2022	2	0
1004	2023	2	0
1005	2005	2	0
1005	2006	2	0
1005	2007	2	0
1005	2008	2	1
1005	2009	2	1
1005	2010	2	1
1005	2011	2	1
1005	2012	2	1
1005	2013	2	1
1005	2014	2	1
1005	2015	2	1
1005	2016	2	1
1005	2017	2	1
1005	2018	2	1
1005	2019	2	1
1005	2020	2	1
1005	2021	2	0
1005	2022	2	0
1005	2023	2	0
1006	2005	2	1
1006	2006	2	1
1006	2007	2	1
1006	2008	2	1
1006	2009	2	1
1006	2010	2	1
1006	2011	2	1
1006	2012	2	1
1006	2013	2	1
1006	2014	2	1
1006	2015	2	1
1006	2016	2	1
1006	2017	2	1
1006	2018	2	1
1006	2019	2	0
1006	2020	2	0
1006	2021	2	0
1006	2022	2	0
1006	2023	2	0
1008	2005	2	1
1008	2006	2	1
1008	2007	2	1
1008	2008	2	0
1008	2009	2	0
1008	2010	2	0
1008	2011	2	0
1008	2012	2	0
1008	2013	2	1
1008	2014	2	1
1008	2015	2	1
1008	2016	2	1
1008	2017	2	1
1008	2018	2	1
1008	2019	2	0
1008	2020	2	0
1008	2021	2	0
1008	2022	2	0
1008	2023	2	0
1003	2005	3	0
1003	2006	3	0
1003	2007	3	0
1003	2008	3	0
1003	2009	3	0
1003	2010	3	0
1003	2011	3	0
1003	2012	3	0
1003	2013	3	0
1003	2014	3	0
1003	2015	3	0
1003	2016	3	0
1003	2017	3	0
1003	2018	3	0
1003	2019	3	0
1003	2020	3	0
1003	2021	3	0
1003	2022	3	0
1003	2023	3	0
;
RUN;

I want to create a timeseries of a new variable called "treatment" for each firm from 2005 to 2023 based on the investment_year for each individual in the "have" dataset.

The rules to create this "treatment" variable is quite simple, but best illustrated using an example. Consider firm=1004 and individual=1 in "have". This individual invests in this firm in 2006 (i.e., in investment_year=2006, we see firm=1004 appears for individual=1), the individual invests in this firm again in 2007 (i.e., firm=1004 appears again for individual 1 when investment_year=2007), does NOT invest in it in 2010 (i.e., firm=1004 does NOT appear for individual 1 in investment_year=2010), and then invests in it again in 2015 (i.e., firm=1004 appears again for individual 1 when investment_year=2015).

The "treatment" variable in "want"is a binary variable that takes either the value of 0 or 1 based on when the individual invests in the given firm. So, for firm=1004 and individual=1, I want to create a timeseries from 2005 to 2023 where "treatment" starts with a value of 0 in 2005 but takes on a value of 1 from 2006 (because individual 1 invests in this firm), it stays on as 1 until the next round of investment, which is in 2007, it will stay as 1 from 2007 to 2010, but will switch back to 0 from 2010 since the individual does not invest in this firm from 2010, but it will switch back to 1 from 2015 onwards since the individual invests in this firm again. Note that if individual_exit_year is empty, then the value of treatment stays from the last investment_year until 2023. In this case, because individual 1 invests in firm=1004 in investment_year=2015 and this individual has an empty individual_exit_year, treatment will equal to 1 from 2015 all the way until 2023.

However, if individual_exit_year is NOT empty, then the value of treatment will equal to 0 from whenever the year in individual_exit_year is. As an example, consider firm=1003 and individual=2. This individual invests in this firm in 2005, so treatment will take a value of 1 from 2005 until 2008 when it will switch to 0 because the individual does NOT invest in this firm in investment_year=2008. Treatment will keep maintaining a value of 0 through to 2013 (since the individual still does NOT invest in this firm in investment_year=2013). But, treatment will take a value of 1 from 2019 (since the individual invests in this firm in 2019), but because individual_exit_year=2021 for individual 2, treatment will only take a value of 1 until 2021, where it will switch back to 0 until 2023. Note that if investment_year and individual_exit_year are the same (as is the case for individual=3), the individual_exit_year takes precedence, i.e., treatment will take a value of 0 until 2023.

elbarto · Posted 10-12-2023 12:24 PM

Thank you! I managed to tweak your code to get it 🙂


data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

proc sort data=have out=temp2(keep=individual investment_year) nodupkey;
by individual investment_year;
run;




data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;

    create table base3
        as select distinct a.*, b.investment_year, c.investment_year as inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
				left join temp2 c
					on a.individual=c.individual 
					and a.year=c.investment_year
                order by firm, individual, year
    ;
quit;


data base3;
set base3;

if investment_year^=. then do;
	if investment_year = inv_year then treat=1;
end;

if inv_year^=. and investment_year=. then treat=0;


run;


data base3;
  set base3;
  by firm individual;
  if first.individual then treat_fill=treat;
  else treat_fill=coalesce(treat,treat_fill); /** Note order matters treat, treat_fill **/
  retain treat_fill;
run;

data base3;
set base3;
if treat_fill=. then treat_fill=0;

if individual_exit_year^=. then do;
	if year>=individual_exit_year then treat_fill=0;
end;


run;

View solution in original post

rudfaden · Posted 10-12-2023 10:38 AM

I can't get your data to match with your description of the rules. But this should get you started.

data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;
    create table temp2
        as select distinct investment_year as inv_year 
            from have
    ;
    create table base3
        as select distinct a.*, b.investment_year, c.inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
                left join temp2 c
                    on a.year=c.inv_year
                order by individual,year, firm 
    ;
quit;

data want;
    set base3;
    retain treatment;
    by individual year firm;

    if first.indiviudal then
        treatment=0;

    if investment_year=inv_year and investment_year ne . then
        treatment=1;
    else if investment_year=. and inv_year ne . then treatment=0;
run;

elbarto · Posted 10-12-2023 12:24 PM

Thank you! I managed to tweak your code to get it 🙂


data temp;
    set have(keep=individual firm individual_exit_year);
run;

proc sort data=temp out=base nodup;
    by individual firm individual_exit_year;
run;

proc sort data=have out=temp2(keep=individual investment_year) nodupkey;
by individual investment_year;
run;




data base2;
    set base;

    do year=2005 to 2023;
        output;
    end;
run;

proc sql;

    create table base3
        as select distinct a.*, b.investment_year, c.investment_year as inv_year
            from base2 a
                left join have b
                    on a.individual=b.individual
                    and a.firm=b.firm
                    and a.individual_exit_year=b.individual_exit_year
                    and a.year=b.investment_year
				left join temp2 c
					on a.individual=c.individual 
					and a.year=c.investment_year
                order by firm, individual, year
    ;
quit;


data base3;
set base3;

if investment_year^=. then do;
	if investment_year = inv_year then treat=1;
end;

if inv_year^=. and investment_year=. then treat=0;


run;


data base3;
  set base3;
  by firm individual;
  if first.individual then treat_fill=treat;
  else treat_fill=coalesce(treat,treat_fill); /** Note order matters treat, treat_fill **/
  retain treat_fill;
run;

data base3;
set base3;
if treat_fill=. then treat_fill=0;

if individual_exit_year^=. then do;
	if year>=individual_exit_year then treat_fill=0;
end;


run;

Creating a timeseries based on information in a dataset

Re: Creating a timeseries based on information in a dataset

Re: Creating a timeseries based on information in a dataset

Re: Creating a timeseries based on information in a dataset

Creating a timeseries based on information in a dataset

Re: Creating a timeseries based on information in a dataset

Re: Creating a timeseries based on information in a dataset

Re: Creating a timeseries based on information in a dataset

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away