BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have the following dataset:

 

 

data have;
input year	firm_id	location_id	action	action_amount	operate	new_entry
;
cards;
2013	28013	6085	1	10000	0	0
2015	28013	6085	1	12000	0	0
2015	28013	29189	1	10000	0	0
2016	28013	34019	1	5000	1	1
2017	28013	34019	0	0	1	2
2011	120609	9003	1	7000	0	0
2012	120609	9003	0	0	1	1
2013	120609	9003	1	5000	1	2
2012	247908	23001	1	9000	0	0
2013	247908	23001	1	8000	0	0
2014	247908	23001	1	8500	1	1
2015	247908	23001	0	0	1	2
2003	356123	1001	0	0	0	0
2004	356123	1001	0	0	0	0
2009	356123	1001	1	9800	1	1
;
run;

I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset:

 

data want;
input year	firm_id	location_id	action	action_amount	operate	new_entry	pre_action	pre_action_amount
;
cards;
2013	28013	6085	1	10000	0	0	.	.
2014	28013	6085	0	0	0	0	1	10000
2015	28013	6085	1	12000	0	0	.	.
2016	28013	6085	0	0	0	0	1	12000
2015	28013	29189	1	6500	0	0	.	.
2016	28013	29189	0	0	0	0	1	6500
2016	28013	34019	1	5000	1	1	0	0
2017	28013	34019	0	0	1	2	.	.
2011	120609	9003	1	7000	0	0	.	.
2012	120609	9003	0	0	1	1	1	7000
2013	120609	9003	1	5000	1	2	.	.
2012	247908	23001	1	9000	0	0	.	.
2013	247908	23001	1	8000	0	0	1	9000
2014	247908	23001	1	8500	1	1	1	8000
2015	247908	23001	0	0	1	2	.	.
2003	356123	1001	0	0	0	0	.	.
2004	356123	1001	0	0	0	0	0	0
2005	356123	1001	0	0	0	0	0	0
2009	356123	1001	1	9800	1	1	0	0
;
run;

The rules are as follows:

 

1) First, consider only the rows with operate = 0.

 

  1. For each firm_id and location_id pair, if in the following year there is no row with the same firm_id and location_id, then create a new row with the following year and same firm_id and location_id pair. The variables action, action_amount, operate, and new_entry are all set to 0, while pre_action and pre_action_amount is set to be the value of action and action_amount in the previous year. Example: In year 2013, for the firm_id/location_id pair 28013/6085, we have operate = 0. But in 2014, there are no observations for this firm_id/location_id pair. So we set action, action_amount, operate, and new_entry to be 0 and pre_action=1 and pre_action_amount=10000 which are the values for action and action_amount in 2013.
  2. For each firm_id and location_id pair, if in the following year there is a row with the same firm_id and location_id, then simply set pre_action and pre_action_amount to be the value of action and action_amount in the previous year. Example: In year 2011 for firm_id/location_id 120609/9003, we have operate=0. But in the next year 2012, there is a row with this firm_id/location_id pair. So we set pre_action=1 and pre_action_amount=7000 which are the values for action and action_amount in 2011. Another example is in year 2003, for the firm_id/location_id 356123/1001.

2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0.

 

3) All other values of pre_action and pre_action_amount are empty.

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @elbarto 

 

Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:

It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.

In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1

 

Hope this help a little.

 

Best,

 

/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */

proc sql;
	create table op0_year_bound1 as
	select distinct firm_id, location_id, operate,
					min(year) as start_year,
					max(year) as end_year,
					0 as action,
					0 as action_amount,
					0 as new_entry
	from have
	where operate = 0
	group by firm_id, location_id
	order by firm_id, location_id;
quit;

/* Create new rows for each year*/

data op0_year_bound2;
	set op0_year_bound1;
	do year = start_year to end_year + 1;
		output;
	end;
	drop start_year end_year;
run;

/* Insert these new rows in the input dataset where operate = 0 */

data have2;
	merge op0_year_bound2 have(where=(operate=0));
	by firm_id location_id year operate;
run;

/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
 -> select operate = 1 in case of duplicate record */

data have3;
	set have (where=(operate ne 0)) have2;
run;

proc sort data=have3;
	by firm_id location_id year operate;
run;

data have4;
	set have3;
	by firm_id location_id year operate;
	if last.year then output;
run;

/* Compute pre_action and pre_action_amount flags */

data have5;
	set have4;
		pre_action = lag(action);
		pre_action_amount = lag(action_amount);
	
	if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
		pre_action = .;
		pre_action_amount = .;
	end;
	
	if new_entry=1 and pre_action = . and pre_action_amount = . then do;
		pre_action = 0;
		pre_action_amount = 0;
	end;
run;

/* Sort the data */

proc sort data=have5 out=want;
	by firm_id location_id year;
run;

Capture d’écran 2020-01-11 à 13.43.46.png

View solution in original post

3 REPLIES 3
andreas_lds
Jade | Level 19

Have you tried anything to solve the problem?

elbarto
Obsidian | Level 7
I am quite a beginner at SAS. I am not sure how to add additional rows based on the complicated conditions as in my post.

Any help/hints would be much appreciated.
ed_sas_member
Meteorite | Level 14

Hi @elbarto 

 

Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:

It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.

In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1

 

Hope this help a little.

 

Best,

 

/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */

proc sql;
	create table op0_year_bound1 as
	select distinct firm_id, location_id, operate,
					min(year) as start_year,
					max(year) as end_year,
					0 as action,
					0 as action_amount,
					0 as new_entry
	from have
	where operate = 0
	group by firm_id, location_id
	order by firm_id, location_id;
quit;

/* Create new rows for each year*/

data op0_year_bound2;
	set op0_year_bound1;
	do year = start_year to end_year + 1;
		output;
	end;
	drop start_year end_year;
run;

/* Insert these new rows in the input dataset where operate = 0 */

data have2;
	merge op0_year_bound2 have(where=(operate=0));
	by firm_id location_id year operate;
run;

/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
 -> select operate = 1 in case of duplicate record */

data have3;
	set have (where=(operate ne 0)) have2;
run;

proc sort data=have3;
	by firm_id location_id year operate;
run;

data have4;
	set have3;
	by firm_id location_id year operate;
	if last.year then output;
run;

/* Compute pre_action and pre_action_amount flags */

data have5;
	set have4;
		pre_action = lag(action);
		pre_action_amount = lag(action_amount);
	
	if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
		pre_action = .;
		pre_action_amount = .;
	end;
	
	if new_entry=1 and pre_action = . and pre_action_amount = . then do;
		pre_action = 0;
		pre_action_amount = 0;
	end;
run;

/* Sort the data */

proc sort data=have5 out=want;
	by firm_id location_id year;
run;

Capture d’écran 2020-01-11 à 13.43.46.png

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 853 views
  • 1 like
  • 3 in conversation