I have the following dataset:
data have; input year firm_id location_id action action_amount operate new_entry ; cards; 2013 28013 6085 1 10000 0 0 2015 28013 6085 1 12000 0 0 2015 28013 29189 1 10000 0 0 2016 28013 34019 1 5000 1 1 2017 28013 34019 0 0 1 2 2011 120609 9003 1 7000 0 0 2012 120609 9003 0 0 1 1 2013 120609 9003 1 5000 1 2 2012 247908 23001 1 9000 0 0 2013 247908 23001 1 8000 0 0 2014 247908 23001 1 8500 1 1 2015 247908 23001 0 0 1 2 2003 356123 1001 0 0 0 0 2004 356123 1001 0 0 0 0 2009 356123 1001 1 9800 1 1 ; run;
I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset:
data want; input year firm_id location_id action action_amount operate new_entry pre_action pre_action_amount ; cards; 2013 28013 6085 1 10000 0 0 . . 2014 28013 6085 0 0 0 0 1 10000 2015 28013 6085 1 12000 0 0 . . 2016 28013 6085 0 0 0 0 1 12000 2015 28013 29189 1 6500 0 0 . . 2016 28013 29189 0 0 0 0 1 6500 2016 28013 34019 1 5000 1 1 0 0 2017 28013 34019 0 0 1 2 . . 2011 120609 9003 1 7000 0 0 . . 2012 120609 9003 0 0 1 1 1 7000 2013 120609 9003 1 5000 1 2 . . 2012 247908 23001 1 9000 0 0 . . 2013 247908 23001 1 8000 0 0 1 9000 2014 247908 23001 1 8500 1 1 1 8000 2015 247908 23001 0 0 1 2 . . 2003 356123 1001 0 0 0 0 . . 2004 356123 1001 0 0 0 0 0 0 2005 356123 1001 0 0 0 0 0 0 2009 356123 1001 1 9800 1 1 0 0 ; run;
The rules are as follows:
1) First, consider only the rows with operate = 0.
2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0.
3) All other values of pre_action and pre_action_amount are empty.
Hi elbarto,
Perhaps not the optimal solution, and I am also not able to follow 100% the steps that you want to perform. But would something like the below help you to get the "new" rows that you want to include?
If the below helps, then afterwards either with RETAIN statement or LAG function you should be able to derive the new variables that you want as needed.
Please check as an example.
/*Identify unique firm ID, Location ID and year values*/
proc sort data = have nodupkey out = x (keep = firm_id location_id year);
by firm_id location_id year;
where operate = 0;
run;
/*Per firm/location ID keep only first (start) and last (end) year*/
data x2;
set x;
by firm_id location_id;
if first.location_id or last.location_id;
run;
/*Transpose so that per firm/location ID start and end year are on same row*/
proc transpose data = x2 out = x3 (drop = _name_ rename = (col1 = start col2 = end));
var year;
by firm_id location_id;
run;
/*Create a "skeleton" with all years from start to end per firm/location ID*/
data x4 (drop = start end i);
set x3;
by firm_id location_id;
/*For the case of only single year (i.e. end = start)*/
if end = . then end = start;
do i = start to end by 1;
year = i;
output;
end;
run;
/*Merge "skeleton" to original data by firm/location ID and year*/
data have2;
merge have (in = A) x4 (in = B);
by firm_id location_id year;
run;
Hi elbarto,
Perhaps not the optimal solution, and I am also not able to follow 100% the steps that you want to perform. But would something like the below help you to get the "new" rows that you want to include?
If the below helps, then afterwards either with RETAIN statement or LAG function you should be able to derive the new variables that you want as needed.
Please check as an example.
/*Identify unique firm ID, Location ID and year values*/
proc sort data = have nodupkey out = x (keep = firm_id location_id year);
by firm_id location_id year;
where operate = 0;
run;
/*Per firm/location ID keep only first (start) and last (end) year*/
data x2;
set x;
by firm_id location_id;
if first.location_id or last.location_id;
run;
/*Transpose so that per firm/location ID start and end year are on same row*/
proc transpose data = x2 out = x3 (drop = _name_ rename = (col1 = start col2 = end));
var year;
by firm_id location_id;
run;
/*Create a "skeleton" with all years from start to end per firm/location ID*/
data x4 (drop = start end i);
set x3;
by firm_id location_id;
/*For the case of only single year (i.e. end = start)*/
if end = . then end = start;
do i = start to end by 1;
year = i;
output;
end;
run;
/*Merge "skeleton" to original data by firm/location ID and year*/
data have2;
merge have (in = A) x4 (in = B);
by firm_id location_id year;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.