Hi
I have the following dataset:
Data have; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;
I want the below:
Data want; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;
This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.
Thanks.
Use a retained variable to keep values across observations:
Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;
data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;
proc print data=want noobs;
run;
Result:
ID_W Date_F ID_Fund 105246 2016-06-30 10 105246 2016-09-30 10 105246 2016-12-31 10 105246 2017-03-31 10 105246 2017-06-30 10 105246 2017-09-30 10 105246 2017-12-31 10 105250 2012-12-31 12 . 2017-06-30 12 . 2017-09-30 12 . 2017-12-31 12 105256 2016-03-31 25 105256 2016-09-30 25 105256 2017-06-30 25 105256 2017-09-30 25
Hi
I have the following dataset:
Data have; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;
I want the below:
Data want; input ID_W
Date_F: mmddyy10. ID_Fund;
105246 06/30/2016 10
datalines;
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;
This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.
Thanks.
Use a retained variable to keep values across observations:
Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;
data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;
proc print data=want noobs;
run;
Result:
ID_W Date_F ID_Fund 105246 2016-06-30 10 105246 2016-09-30 10 105246 2016-12-31 10 105246 2017-03-31 10 105246 2017-06-30 10 105246 2017-09-30 10 105246 2017-12-31 10 105250 2012-12-31 12 . 2017-06-30 12 . 2017-09-30 12 . 2017-12-31 12 105256 2016-03-31 25 105256 2016-09-30 25 105256 2017-06-30 25 105256 2017-09-30 25
Something like:
data have; input id_w date_f: mmddyy10. id_fund; format date_f date9.; datalines; 105246 06/30/2016 10 105246 09/30/2016 10 105246 12/31/2016 10 . 03/31/2017 10 . 06/30/2017 10 . 09/30/2017 10 . 12/31/2017 10 105250 12/31/2012 12 . 06/30/2017 12 . 09/30/2017 12 . 12/31/2017 12 105256 03/31/2016 25 105256 09/30/2016 25 . 06/30/2017 25 . 09/30/2017 25 ; run; data want; set have; retain lstid; if id_w ne . and year(date_f)=2016 then lstid=id_w; if id_w ne . and id_w ne lstid then lstid=.; if id_w=. and year(date_f) ge 2016 then id_w=lstid; run;
Here is a solution that use the SET ... POINT= instead of retained variables.
Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;
data want;
set have ;
by id_fund;;
if first.id_fund then ptr=.;
if year(date_f)=2016 then ptr=_n_;
if id_w=. and ptr^=. then set have (keep=id_w) point=ptr;
run;
The ptr variable keeps track of the position of most recent 2016 record within a BY ID_FUND group. When you are concerned with updating only the ID_W variable, it has no particular advantage over the retain approach. But if you wanted to recall (say) 20 variables from the 2016 record when ID_W is missing, then all you would need to do is list those 20 variables in the "keep=" parameter in the SET .... POINT= statement.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.