BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have the following dataset:

 

 

data have;
input year	firm_id	location_id	action	action_amount	operate	new_entry
;
cards;
2013	28013	6085	1	10000	0	0
2015	28013	6085	1	12000	0	0
2015	28013	29189	1	10000	0	0
2016	28013	34019	1	5000	1	1
2017	28013	34019	0	0	1	2
2011	120609	9003	1	7000	0	0
2012	120609	9003	0	0	1	1
2013	120609	9003	1	5000	1	2
2012	247908	23001	1	9000	0	0
2013	247908	23001	1	8000	0	0
2014	247908	23001	1	8500	1	1
2015	247908	23001	0	0	1	2
2003	356123	1001	0	0	0	0
2004	356123	1001	0	0	0	0
2009	356123	1001	1	9800	1	1
;
run;

I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset:

 

data want;
input year	firm_id	location_id	action	action_amount	operate	new_entry	pre_action	pre_action_amount
;
cards;
2013	28013	6085	1	10000	0	0	.	.
2014	28013	6085	0	0	0	0	1	10000
2015	28013	6085	1	12000	0	0	.	.
2016	28013	6085	0	0	0	0	1	12000
2015	28013	29189	1	6500	0	0	.	.
2016	28013	29189	0	0	0	0	1	6500
2016	28013	34019	1	5000	1	1	0	0
2017	28013	34019	0	0	1	2	.	.
2011	120609	9003	1	7000	0	0	.	.
2012	120609	9003	0	0	1	1	1	7000
2013	120609	9003	1	5000	1	2	.	.
2012	247908	23001	1	9000	0	0	.	.
2013	247908	23001	1	8000	0	0	1	9000
2014	247908	23001	1	8500	1	1	1	8000
2015	247908	23001	0	0	1	2	.	.
2003	356123	1001	0	0	0	0	.	.
2004	356123	1001	0	0	0	0	0	0
2005	356123	1001	0	0	0	0	0	0
2009	356123	1001	1	9800	1	1	0	0
;
run;

The rules are as follows:

 

1) First, consider only the rows with operate = 0.

 

  1. For each firm_id and location_id pair, if in the following year there is no row with the same firm_id and location_id, then create a new row with the following year and same firm_id and location_id pair. The variables action, action_amount, operate, and new_entry are all set to 0, while pre_action and pre_action_amount is set to be the value of action and action_amount in the previous year. Example: In year 2013, for the firm_id/location_id pair 28013/6085, we have operate = 0. But in 2014, there are no observations for this firm_id/location_id pair. So we set action, action_amount, operate, and new_entry to be 0 and pre_action=1 and pre_action_amount=10000 which are the values for action and action_amount in 2013.
  2. For each firm_id and location_id pair, if in the following year there is a row with the same firm_id and location_id, then simply set pre_action and pre_action_amount to be the value of action and action_amount in the previous year. Example: In year 2011 for firm_id/location_id 120609/9003, we have operate=0. But in the next year 2012, there is a row with this firm_id/location_id pair. So we set pre_action=1 and pre_action_amount=7000 which are the values for action and action_amount in 2011. Another example is in year 2003, for the firm_id/location_id 356123/1001.

2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0.

 

3) All other values of pre_action and pre_action_amount are empty.

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @elbarto 

 

Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:

It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.

In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1

 

Hope this help a little.

 

Best,

 

/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */

proc sql;
	create table op0_year_bound1 as
	select distinct firm_id, location_id, operate,
					min(year) as start_year,
					max(year) as end_year,
					0 as action,
					0 as action_amount,
					0 as new_entry
	from have
	where operate = 0
	group by firm_id, location_id
	order by firm_id, location_id;
quit;

/* Create new rows for each year*/

data op0_year_bound2;
	set op0_year_bound1;
	do year = start_year to end_year + 1;
		output;
	end;
	drop start_year end_year;
run;

/* Insert these new rows in the input dataset where operate = 0 */

data have2;
	merge op0_year_bound2 have(where=(operate=0));
	by firm_id location_id year operate;
run;

/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
 -> select operate = 1 in case of duplicate record */

data have3;
	set have (where=(operate ne 0)) have2;
run;

proc sort data=have3;
	by firm_id location_id year operate;
run;

data have4;
	set have3;
	by firm_id location_id year operate;
	if last.year then output;
run;

/* Compute pre_action and pre_action_amount flags */

data have5;
	set have4;
		pre_action = lag(action);
		pre_action_amount = lag(action_amount);
	
	if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
		pre_action = .;
		pre_action_amount = .;
	end;
	
	if new_entry=1 and pre_action = . and pre_action_amount = . then do;
		pre_action = 0;
		pre_action_amount = 0;
	end;
run;

/* Sort the data */

proc sort data=have5 out=want;
	by firm_id location_id year;
run;

Capture d’écran 2020-01-11 à 13.43.46.png

View solution in original post

3 REPLIES 3
andreas_lds
Jade | Level 19

Have you tried anything to solve the problem?

elbarto
Obsidian | Level 7
I am quite a beginner at SAS. I am not sure how to add additional rows based on the complicated conditions as in my post.

Any help/hints would be much appreciated.
ed_sas_member
Meteorite | Level 14

Hi @elbarto 

 

Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:

It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.

In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1

 

Hope this help a little.

 

Best,

 

/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */

proc sql;
	create table op0_year_bound1 as
	select distinct firm_id, location_id, operate,
					min(year) as start_year,
					max(year) as end_year,
					0 as action,
					0 as action_amount,
					0 as new_entry
	from have
	where operate = 0
	group by firm_id, location_id
	order by firm_id, location_id;
quit;

/* Create new rows for each year*/

data op0_year_bound2;
	set op0_year_bound1;
	do year = start_year to end_year + 1;
		output;
	end;
	drop start_year end_year;
run;

/* Insert these new rows in the input dataset where operate = 0 */

data have2;
	merge op0_year_bound2 have(where=(operate=0));
	by firm_id location_id year operate;
run;

/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
 -> select operate = 1 in case of duplicate record */

data have3;
	set have (where=(operate ne 0)) have2;
run;

proc sort data=have3;
	by firm_id location_id year operate;
run;

data have4;
	set have3;
	by firm_id location_id year operate;
	if last.year then output;
run;

/* Compute pre_action and pre_action_amount flags */

data have5;
	set have4;
		pre_action = lag(action);
		pre_action_amount = lag(action_amount);
	
	if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
		pre_action = .;
		pre_action_amount = .;
	end;
	
	if new_entry=1 and pre_action = . and pre_action_amount = . then do;
		pre_action = 0;
		pre_action_amount = 0;
	end;
run;

/* Sort the data */

proc sort data=have5 out=want;
	by firm_id location_id year;
run;

Capture d’écran 2020-01-11 à 13.43.46.png

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 563 views
  • 1 like
  • 3 in conversation