I have a large data (sorted by recent date) set like this :
ID NAME BEGIN END TOP
191080 John 25/04/2016 01/05/2016 NO
191080 John 08/06/2016 10/06/2016 NO
191080 John 21/06/2016 24/06/2016 Yes
191080 John 21/07/2016 24/09/2016 Yes
191141 Alex 16/01/2017 17/01/2017 NO
191141 Alex 22/02/2017 29/04/2017 NO
If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen
But if only "NO" occurs in TOP for an observation, then the minimum of BEGIN and maximum of END are choosen
I mean how get this :
ID NAME BEGIN END
191080 John 21/07/2016 24/09/2016
191141 Alex 16/01/2017 29/04/2017
You asked for: "If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen".
I assume if an ID has all and only "YES" you still want the recent dates.
If max(END) may be less then recent END date than retain max_end too, and care to update it.
data want;
set have;
by ID; /* assuming data sorted by ID BEGIN */
retain yes_flag min_begin;
if first.ID then do; yes_flag=0; min_begin=begin; end;
if upcase(top) = 'YES' then yes_flag=1;
if last.ID then do;
if yes_flag=0 then begin=min_begin;
output;
end;
run;
You asked for: "If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen".
I assume if an ID has all and only "YES" you still want the recent dates.
If max(END) may be less then recent END date than retain max_end too, and care to update it.
data want;
set have;
by ID; /* assuming data sorted by ID BEGIN */
retain yes_flag min_begin;
if first.ID then do; yes_flag=0; min_begin=begin; end;
if upcase(top) = 'YES' then yes_flag=1;
if last.ID then do;
if yes_flag=0 then begin=min_begin;
output;
end;
run;
date value are numeric, how put them as SAS DATE format ?
@John4 wrote:
date value are numeric, how put them as SAS DATE format ?
What do you mean by "numeric" ? When you browse your data what do see in the date columns?
If date is July 1st, 2016 do you see 01072016 or other number?
if positive then you can convert it to date formet by:
date = input(put(date,z8.),ddmmyy8.);
Beyond, consider having TOP values per ID, whose dates to output:
1) NO, YES (mixed ended by YES)
2) NO, YES, NO (mixed ended by NO)
3) YES, NO (mixed ended by NO)
4) NO, NO (only NO)
5) YES, YES (only YES)
A variant on @Shmuel's suggestion:
data want;
retain minbeg;
merge have (keep=id top where=(top="YES"))
have;
by ID;
if first.id then minbeg=(ifn(top="YES",begin,.);
if last.id;
begin=coalesce(minbeg,begin);
run;
If there are zero YES values, then it takes the most recent date range (i.e. last in a BY group), otherwise begin is set to its first value, but END is unmodified.
The MERGE statement allows you to determine, at the beginning of each BY group, whether any YESes will be present, but it does not change the sequence of BEGIN/END. If no YESes are present then minbeg is set to missing.
The subsetting if ("if last.id;") can be your friend, as in this case it avoids the need for an explicit output statement - just a syle preference really.
It appears that you need to capture the first (least) BEGIN and note whether TOP = "Yes" occurs. I assume that if TOP = "Yes" occurs, but is followed by an observation with TOP = "NO", then we are interested in the observation with TOP = "Yes". Here first., last., and retain are useful:
data top
( drop = __:
top
)
;
set top ;
by id ;
retain __begin
__end .
__top "0"
;
if first.id
then
do ;
__begin = begin ;
__top = "0" ;
end ;
if top = "Yes"
then
do ;
__top = "1" ;
__begin = begin ;
__end = end ;
end ;
if last.id
then
do ;
if __top = "1"
then
do ;
begin = __begin ;
end = __end ;
output ;
end ;
else
do ;
begin = __begin ;
output ;
end ;
end ;
run ;
HTH,
Kevin
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.