I have the following dataset:
data have; input year firm_id location_id action action_amount operate new_entry ; cards; 2013 28013 6085 1 10000 0 0 2015 28013 6085 1 12000 0 0 2015 28013 29189 1 10000 0 0 2016 28013 34019 1 5000 1 1 2017 28013 34019 0 0 1 2 2011 120609 9003 1 7000 0 0 2012 120609 9003 0 0 1 1 2013 120609 9003 1 5000 1 2 2012 247908 23001 1 9000 0 0 2013 247908 23001 1 8000 0 0 2014 247908 23001 1 8500 1 1 2015 247908 23001 0 0 1 2 2003 356123 1001 0 0 0 0 2004 356123 1001 0 0 0 0 2009 356123 1001 1 9800 1 1 ; run;
I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset:
data want; input year firm_id location_id action action_amount operate new_entry pre_action pre_action_amount ; cards; 2013 28013 6085 1 10000 0 0 . . 2014 28013 6085 0 0 0 0 1 10000 2015 28013 6085 1 12000 0 0 . . 2016 28013 6085 0 0 0 0 1 12000 2015 28013 29189 1 6500 0 0 . . 2016 28013 29189 0 0 0 0 1 6500 2016 28013 34019 1 5000 1 1 0 0 2017 28013 34019 0 0 1 2 . . 2011 120609 9003 1 7000 0 0 . . 2012 120609 9003 0 0 1 1 1 7000 2013 120609 9003 1 5000 1 2 . . 2012 247908 23001 1 9000 0 0 . . 2013 247908 23001 1 8000 0 0 1 9000 2014 247908 23001 1 8500 1 1 1 8000 2015 247908 23001 0 0 1 2 . . 2003 356123 1001 0 0 0 0 . . 2004 356123 1001 0 0 0 0 0 0 2005 356123 1001 0 0 0 0 0 0 2009 356123 1001 1 9800 1 1 0 0 ; run;
The rules are as follows:
1) First, consider only the rows with operate = 0.
2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0.
3) All other values of pre_action and pre_action_amount are empty.
Hi @elbarto
Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:
It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.
In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1
Hope this help a little.
Best,
/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */
proc sql;
create table op0_year_bound1 as
select distinct firm_id, location_id, operate,
min(year) as start_year,
max(year) as end_year,
0 as action,
0 as action_amount,
0 as new_entry
from have
where operate = 0
group by firm_id, location_id
order by firm_id, location_id;
quit;
/* Create new rows for each year*/
data op0_year_bound2;
set op0_year_bound1;
do year = start_year to end_year + 1;
output;
end;
drop start_year end_year;
run;
/* Insert these new rows in the input dataset where operate = 0 */
data have2;
merge op0_year_bound2 have(where=(operate=0));
by firm_id location_id year operate;
run;
/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
-> select operate = 1 in case of duplicate record */
data have3;
set have (where=(operate ne 0)) have2;
run;
proc sort data=have3;
by firm_id location_id year operate;
run;
data have4;
set have3;
by firm_id location_id year operate;
if last.year then output;
run;
/* Compute pre_action and pre_action_amount flags */
data have5;
set have4;
pre_action = lag(action);
pre_action_amount = lag(action_amount);
if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
pre_action = .;
pre_action_amount = .;
end;
if new_entry=1 and pre_action = . and pre_action_amount = . then do;
pre_action = 0;
pre_action_amount = 0;
end;
run;
/* Sort the data */
proc sort data=have5 out=want;
by firm_id location_id year;
run;
Have you tried anything to solve the problem?
Hi @elbarto
Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:
It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.
In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1
Hope this help a little.
Best,
/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */
proc sql;
create table op0_year_bound1 as
select distinct firm_id, location_id, operate,
min(year) as start_year,
max(year) as end_year,
0 as action,
0 as action_amount,
0 as new_entry
from have
where operate = 0
group by firm_id, location_id
order by firm_id, location_id;
quit;
/* Create new rows for each year*/
data op0_year_bound2;
set op0_year_bound1;
do year = start_year to end_year + 1;
output;
end;
drop start_year end_year;
run;
/* Insert these new rows in the input dataset where operate = 0 */
data have2;
merge op0_year_bound2 have(where=(operate=0));
by firm_id location_id year operate;
run;
/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
-> select operate = 1 in case of duplicate record */
data have3;
set have (where=(operate ne 0)) have2;
run;
proc sort data=have3;
by firm_id location_id year operate;
run;
data have4;
set have3;
by firm_id location_id year operate;
if last.year then output;
run;
/* Compute pre_action and pre_action_amount flags */
data have5;
set have4;
pre_action = lag(action);
pre_action_amount = lag(action_amount);
if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
pre_action = .;
pre_action_amount = .;
end;
if new_entry=1 and pre_action = . and pre_action_amount = . then do;
pre_action = 0;
pre_action_amount = 0;
end;
run;
/* Sort the data */
proc sort data=have5 out=want;
by firm_id location_id year;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.