BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8

Hi

I have the following dataset:

 

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

 

I want the below:

 

Data want;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;

 

This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a retained variable to keep values across observations:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;

proc print data=want noobs;
run;

Result:

 ID_W         Date_F    ID_Fund

105246    2016-06-30       10  
105246    2016-09-30       10  
105246    2016-12-31       10  
105246    2017-03-31       10  
105246    2017-06-30       10  
105246    2017-09-30       10  
105246    2017-12-31       10  
105250    2012-12-31       12  
     .    2017-06-30       12  
     .    2017-09-30       12  
     .    2017-12-31       12  
105256    2016-03-31       25  
105256    2016-09-30       25  
105256    2017-06-30       25  
105256    2017-09-30       25  

View solution in original post

7 REPLIES 7
Saba1
Quartz | Level 8

Hi

I have the following dataset:

 

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

 

I want the below:

 

Data want;
input ID_W Date_F: mmddyy10. ID_Fund;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
105246 03/31/2017 10
105246 06/30/2017 10
105246 09/30/2017 10
105246 12/31/2017 10
105250 12/31/2012 12
. 06/30/2015 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
105256 06/30/2017 25
105256 09/30/2017 25
run;

 

This means for each ID_Fund, if ID_W exists for the dates of the year 2016, then it should replace the missing ID_Ws for all the dates of the year 2017. However, if ID_W exists for any year before 2016, then it should not replace the missing values.

 

Thanks.

Kurt_Bremser
Super User

Use a retained variable to keep values across observations:

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
set have;
by id_fund;
retain _w;
if first.id_fund then _w = .;
if year(date_f) = 2016 and id_w ne . then _w = id_w;
if year(date_f) = 2017 and id_w = . and _w ne . then id_w = _w;
drop _w;
run;

proc print data=want noobs;
run;

Result:

 ID_W         Date_F    ID_Fund

105246    2016-06-30       10  
105246    2016-09-30       10  
105246    2016-12-31       10  
105246    2017-03-31       10  
105246    2017-06-30       10  
105246    2017-09-30       10  
105246    2017-12-31       10  
105250    2012-12-31       12  
     .    2017-06-30       12  
     .    2017-09-30       12  
     .    2017-12-31       12  
105256    2016-03-31       25  
105256    2016-09-30       25  
105256    2017-06-30       25  
105256    2017-09-30       25  
Saba1
Quartz | Level 8
thanks a lot. it works really well.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

data have;
  input id_w date_f: mmddyy10. id_fund;
  format date_f date9.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
;
run;

data want;
  set have;
  retain lstid;
  if id_w ne . and year(date_f)=2016 then lstid=id_w;
  if id_w ne . and id_w ne lstid then lstid=.;
  if id_w=. and year(date_f) ge 2016 then id_w=lstid;
run;
  
  
  
Saba1
Quartz | Level 8
thank you for the guidance.
mkeintz
PROC Star

Here is a solution that use the SET ... POINT=  instead of retained variables.

 

Data have;
input ID_W Date_F: mmddyy10. ID_Fund;
format Date_F yymmddd10.;
datalines;
105246 06/30/2016 10
105246 09/30/2016 10
105246 12/31/2016 10
. 03/31/2017 10
. 06/30/2017 10
. 09/30/2017 10
. 12/31/2017 10
105250 12/31/2012 12
. 06/30/2017 12
. 09/30/2017 12
. 12/31/2017 12
105256 03/31/2016 25
105256 09/30/2016 25
. 06/30/2017 25
. 09/30/2017 25
run;

data want;
  set have ;
  by id_fund;;
  if first.id_fund then ptr=.;
  if year(date_f)=2016 then ptr=_n_;
  if id_w=. and ptr^=. then set have (keep=id_w) point=ptr;
run;

The ptr variable keeps track of the position of most recent 2016 record within a BY ID_FUND group. When you are concerned with updating only the ID_W variable, it has no particular advantage over the retain approach.  But if you wanted to recall (say) 20 variables from the 2016 record when ID_W is missing, then all you would need to do is list those 20 variables in the "keep=" parameter in the SET .... POINT= statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 960 views
  • 0 likes
  • 4 in conversation