Solved: Re: Inserting additional rows based on other rows

elbarto · Posted 01-07-2020 03:21 AM

I have the following dataset:

data have;
input year	firm_id	location_id	action	action_amount	operate	new_entry
;
cards;
2013	28013	6085	1	10000	0	0
2015	28013	6085	1	12000	0	0
2015	28013	29189	1	10000	0	0
2016	28013	34019	1	5000	1	1
2017	28013	34019	0	0	1	2
2011	120609	9003	1	7000	0	0
2012	120609	9003	0	0	1	1
2013	120609	9003	1	5000	1	2
2012	247908	23001	1	9000	0	0
2013	247908	23001	1	8000	0	0
2014	247908	23001	1	8500	1	1
2015	247908	23001	0	0	1	2
2003	356123	1001	0	0	0	0
2004	356123	1001	0	0	0	0
2009	356123	1001	1	9800	1	1



;
run;

I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset:

data want;
input year	firm_id	location_id	action	action_amount	operate	new_entry	pre_action	pre_action_amount
;
cards;


2013	28013	6085	1	10000	0	0	.	.
2014	28013	6085	0	0	0	0	1	10000
2015	28013	6085	1	12000	0	0	.	.
2016	28013	6085	0	0	0	0	1	12000
2015	28013	29189	1	6500	0	0	.	.
2016	28013	29189	0	0	0	0	1	6500
2016	28013	34019	1	5000	1	1	0	0
2017	28013	34019	0	0	1	2	.	.
2011	120609	9003	1	7000	0	0	.	.
2012	120609	9003	0	0	1	1	1	7000
2013	120609	9003	1	5000	1	2	.	.
2012	247908	23001	1	9000	0	0	.	.
2013	247908	23001	1	8000	0	0	1	9000
2014	247908	23001	1	8500	1	1	1	8000
2015	247908	23001	0	0	1	2	.	.
2003	356123	1001	0	0	0	0	.	.
2004	356123	1001	0	0	0	0	0	0
2005	356123	1001	0	0	0	0	0	0
2009	356123	1001	1	9800	1	1	0	0



;
run;

The rules are as follows:

1) First, consider only the rows with operate = 0.

For each firm_id and location_id pair, if in the following year there is no row with the same firm_id and location_id, then create a new row with the following year and same firm_id and location_id pair. The variables action, action_amount, operate, and new_entry are all set to 0, while pre_action and pre_action_amount is set to be the value of action and action_amount in the previous year. Example: In year 2013, for the firm_id/location_id pair 28013/6085, we have operate = 0. But in 2014, there are no observations for this firm_id/location_id pair. So we set action, action_amount, operate, and new_entry to be 0 and pre_action=1 and pre_action_amount=10000 which are the values for action and action_amount in 2013.
For each firm_id and location_id pair, if in the following year there is a row with the same firm_id and location_id, then simply set pre_action and pre_action_amount to be the value of action and action_amount in the previous year. Example: In year 2011 for firm_id/location_id 120609/9003, we have operate=0. But in the next year 2012, there is a row with this firm_id/location_id pair. So we set pre_action=1 and pre_action_amount=7000 which are the values for action and action_amount in 2011. Another example is in year 2003, for the firm_id/location_id 356123/1001.

2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0.

3) All other values of pre_action and pre_action_amount are empty.

geoskiad · Posted 01-08-2020 08:04 AM

Hi elbarto,

Perhaps not the optimal solution, and I am also not able to follow 100% the steps that you want to perform. But would something like the below help you to get the "new" rows that you want to include?

If the below helps, then afterwards either with RETAIN statement or LAG function you should be able to derive the new variables that you want as needed.

Please check as an example.

/*Identify unique firm ID, Location ID and year values*/
proc sort data = have nodupkey out = x (keep = firm_id location_id year);
  by firm_id location_id year;
  where operate = 0;
run;

/*Per firm/location ID keep only first (start) and last (end) year*/
data x2;
set x;
  by firm_id location_id;
  if first.location_id or last.location_id;
run;

/*Transpose so that per firm/location ID start and end year are on same row*/
proc transpose data = x2 out = x3 (drop = _name_ rename = (col1 = start col2 = end));
  var year;
  by firm_id location_id;
run;

/*Create a "skeleton" with all years from start to end per firm/location ID*/
data x4 (drop = start end i);
set x3;
  by firm_id location_id;
  /*For the case of only single year (i.e. end = start)*/
  if end = . then end = start;
  do i = start to end by 1;
    year = i;
    output;
  end;
run;

/*Merge "skeleton" to original data by firm/location ID and year*/
data have2;
merge have (in = A) x4 (in = B);
  by firm_id location_id year;
run;

View solution in original post

geoskiad · Posted 01-08-2020 08:04 AM

Hi elbarto,

Perhaps not the optimal solution, and I am also not able to follow 100% the steps that you want to perform. But would something like the below help you to get the "new" rows that you want to include?

If the below helps, then afterwards either with RETAIN statement or LAG function you should be able to derive the new variables that you want as needed.

Please check as an example.

/*Identify unique firm ID, Location ID and year values*/
proc sort data = have nodupkey out = x (keep = firm_id location_id year);
  by firm_id location_id year;
  where operate = 0;
run;

/*Per firm/location ID keep only first (start) and last (end) year*/
data x2;
set x;
  by firm_id location_id;
  if first.location_id or last.location_id;
run;

/*Transpose so that per firm/location ID start and end year are on same row*/
proc transpose data = x2 out = x3 (drop = _name_ rename = (col1 = start col2 = end));
  var year;
  by firm_id location_id;
run;

/*Create a "skeleton" with all years from start to end per firm/location ID*/
data x4 (drop = start end i);
set x3;
  by firm_id location_id;
  /*For the case of only single year (i.e. end = start)*/
  if end = . then end = start;
  do i = start to end by 1;
    year = i;
    output;
  end;
run;

/*Merge "skeleton" to original data by firm/location ID and year*/
data have2;
merge have (in = A) x4 (in = B);
  by firm_id location_id year;
run;

Inserting additional rows based on other rows

Re: Inserting additional rows based on other rows

Re: Inserting additional rows based on other rows

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away