I have the following dataset:
data have; input year firm_id location_id action action_amount operate new_entry ; cards; 2013 28013 6085 1 10000 0 0 2015 28013 6085 1 12000 0 0 2015 28013 29189 1 10000 0 0 2016 28013 34019 1 5000 1 1 2017 28013 34019 0 0 1 2 2011 120609 9003 1 7000 0 0 2012 120609 9003 0 0 1 1 2013 120609 9003 1 5000 1 2 2012 247908 23001 1 9000 0 0 2013 247908 23001 1 8000 0 0 2014 247908 23001 1 8500 1 1 2015 247908 23001 0 0 1 2 2003 356123 1001 0 0 0 0 2004 356123 1001 0 0 0 0 2009 356123 1001 1 9800 1 1 ; run;
I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset:
data want; input year firm_id location_id action action_amount operate new_entry pre_action pre_action_amount ; cards; 2013 28013 6085 1 10000 0 0 . . 2014 28013 6085 0 0 0 0 1 10000 2015 28013 6085 1 12000 0 0 . . 2016 28013 6085 0 0 0 0 1 12000 2015 28013 29189 1 6500 0 0 . . 2016 28013 29189 0 0 0 0 1 6500 2016 28013 34019 1 5000 1 1 0 0 2017 28013 34019 0 0 1 2 . . 2011 120609 9003 1 7000 0 0 . . 2012 120609 9003 0 0 1 1 1 7000 2013 120609 9003 1 5000 1 2 . . 2012 247908 23001 1 9000 0 0 . . 2013 247908 23001 1 8000 0 0 1 9000 2014 247908 23001 1 8500 1 1 1 8000 2015 247908 23001 0 0 1 2 . . 2003 356123 1001 0 0 0 0 . . 2004 356123 1001 0 0 0 0 0 0 2005 356123 1001 0 0 0 0 0 0 2009 356123 1001 1 9800 1 1 0 0 ; run;
The rules are as follows:
1) First, consider only the rows with operate = 0.
2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0.
3) All other values of pre_action and pre_action_amount are empty.
Hi @elbarto
Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:
It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.
In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1
Hope this help a little.
Best,
/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */
proc sql;
create table op0_year_bound1 as
select distinct firm_id, location_id, operate,
min(year) as start_year,
max(year) as end_year,
0 as action,
0 as action_amount,
0 as new_entry
from have
where operate = 0
group by firm_id, location_id
order by firm_id, location_id;
quit;
/* Create new rows for each year*/
data op0_year_bound2;
set op0_year_bound1;
do year = start_year to end_year + 1;
output;
end;
drop start_year end_year;
run;
/* Insert these new rows in the input dataset where operate = 0 */
data have2;
merge op0_year_bound2 have(where=(operate=0));
by firm_id location_id year operate;
run;
/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
-> select operate = 1 in case of duplicate record */
data have3;
set have (where=(operate ne 0)) have2;
run;
proc sort data=have3;
by firm_id location_id year operate;
run;
data have4;
set have3;
by firm_id location_id year operate;
if last.year then output;
run;
/* Compute pre_action and pre_action_amount flags */
data have5;
set have4;
pre_action = lag(action);
pre_action_amount = lag(action_amount);
if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
pre_action = .;
pre_action_amount = .;
end;
if new_entry=1 and pre_action = . and pre_action_amount = . then do;
pre_action = 0;
pre_action_amount = 0;
end;
run;
/* Sort the data */
proc sort data=have5 out=want;
by firm_id location_id year;
run;
Have you tried anything to solve the problem?
Hi @elbarto
Here is a code that follows the specified rules. However, I am a bit concerned about the expected results if we strictly follow those rules:
It seems to be a mistake for the action_amount where location_id = 29189: it differs between the have table and the want table.
In addition, for year=2015, firm_id=28013 and location_id=6085, I don't understand why pre_action and pre_action_amount are not set to 0 as according to your rule 1.1
Hope this help a little.
Best,
/* Retrieve min and max year for each unique firm_id and location_id where operate = 0 */
proc sql;
create table op0_year_bound1 as
select distinct firm_id, location_id, operate,
min(year) as start_year,
max(year) as end_year,
0 as action,
0 as action_amount,
0 as new_entry
from have
where operate = 0
group by firm_id, location_id
order by firm_id, location_id;
quit;
/* Create new rows for each year*/
data op0_year_bound2;
set op0_year_bound1;
do year = start_year to end_year + 1;
output;
end;
drop start_year end_year;
run;
/* Insert these new rows in the input dataset where operate = 0 */
data have2;
merge op0_year_bound2 have(where=(operate=0));
by firm_id location_id year operate;
run;
/* Append this dataset to the input dataset where operate ne 0
and output only one row per unique firm_id / location_id / firm
-> select operate = 1 in case of duplicate record */
data have3;
set have (where=(operate ne 0)) have2;
run;
proc sort data=have3;
by firm_id location_id year operate;
run;
data have4;
set have3;
by firm_id location_id year operate;
if last.year then output;
run;
/* Compute pre_action and pre_action_amount flags */
data have5;
set have4;
pre_action = lag(action);
pre_action_amount = lag(action_amount);
if lag(firm_id) ne firm_id or lag(location_id) ne location_id or lag(operate) ne 0 then do;
pre_action = .;
pre_action_amount = .;
end;
if new_entry=1 and pre_action = . and pre_action_amount = . then do;
pre_action = 0;
pre_action_amount = 0;
end;
run;
/* Sort the data */
proc sort data=have5 out=want;
by firm_id location_id year;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.