BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
iced_tea
Obsidian | Level 7

Hello Community,

I am wondering about how to process those records where start dates are different but end dates are the same, such that the code take the record with the longest days in the date range. In the example below, the problematic employee_id is 97531.

/*Example of initial dataset.*/
data example;
   input employee_id $ startdate :DATE8. enddate :DATE8.;
   format startdate enddate DATE8.;
   datalines;
12345	 21JUN20	 24JUN20
12345	 28JUN20	 18JUL20
12345	 29JUL20	 14AUG20
12345	 09NOV20	 20NOV20
67890	 24APR20	 07JUN20
67890	 24JUN20	 08JUL20
67890	 21JUL20	 26AUG20
97531	 30APR17	 05MAY17
97531	 01MAY17	 05MAY17
;
run;
proc print data=example;
run;

This person is contributing two observations to the dataset: Obs 8 and Obs 9. Since enddate is 05May17 for both, I want to keep Obs 8 and drop Obs 9. This is because I want to keep the observation that has the longest days, in this case it is Obs 8, in order words the one where startdate is 30Apr17.

 

Below is the code for how I envision the ending dataset to look like:

/*Example of eventual dataset.*/
data want;
   input employee_id $ startdate :DATE8. enddate :DATE8. final_startdate :DATE8. final_enddate :DATE8. drop $;
   format startdate enddate final_startdate final_enddate DATE8.;
   datalines;
12345	 21JUN20	 24JUN20	 21JUN20	 24JUN20	no
12345	 28JUN20	 18JUL20	 28JUN20	 18JUL20	no
12345	 29JUL20	 14AUG20	 29JUL20	 14AUG20	no
12345	 09NOV20	 20NOV20	 09NOV20	 20NOV20	no
67890	 24APR20	 07JUN20	 24APR20	 07JUN20	no
67890	 24JUN20	 08JUL20	 24JUN20	 08JUL20	no
67890	 21JUL20	 26AUG20	 21JUL20	 26AUG20	no
97531	 30APR17	 05MAY17	 .	         .	        yes
97531	 01MAY17	 05MAY17	 01MAY17	 05MAY17	no
;
run;
proc print data=want;
run;

 I would appreciate any suggestions in how to approach this.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
mklangley
Lapis Lazuli | Level 10

Is this what you're looking for?

data example;
   input employee_id $ startdate :DATE8. enddate :DATE8.;
   format startdate enddate DATE8.;
   datalines;
12345	 21JUN20	 24JUN20
12345	 28JUN20	 18JUL20
12345	 29JUL20	 14AUG20
12345	 09NOV20	 20NOV20
67890	 24APR20	 07JUN20
67890	 24JUN20	 08JUL20
67890	 21JUL20	 26AUG20
97531	 30APR17	 05MAY17
97531	 01MAY17	 05MAY17
;
run;

proc sort data=example
    out=example_sorted;
    by employee_id enddate descending startdate;
run;

data want;
    set example_sorted;
    by employee_id enddate;
    format final_startdate final_enddate date8. drop $8.;
    if first.enddate
        then do;
            final_startdate = startdate;
            final_enddate = enddate;
            drop = "no";
        end;
    else
        do;
            final_startdate = .;
            final_enddate = .;
            drop = "yes";
        end;
run;

proc print data=want;
run;

View solution in original post

3 REPLIES 3
mklangley
Lapis Lazuli | Level 10

Is this what you're looking for?

data example;
   input employee_id $ startdate :DATE8. enddate :DATE8.;
   format startdate enddate DATE8.;
   datalines;
12345	 21JUN20	 24JUN20
12345	 28JUN20	 18JUL20
12345	 29JUL20	 14AUG20
12345	 09NOV20	 20NOV20
67890	 24APR20	 07JUN20
67890	 24JUN20	 08JUL20
67890	 21JUL20	 26AUG20
97531	 30APR17	 05MAY17
97531	 01MAY17	 05MAY17
;
run;

proc sort data=example
    out=example_sorted;
    by employee_id enddate descending startdate;
run;

data want;
    set example_sorted;
    by employee_id enddate;
    format final_startdate final_enddate date8. drop $8.;
    if first.enddate
        then do;
            final_startdate = startdate;
            final_enddate = enddate;
            drop = "no";
        end;
    else
        do;
            final_startdate = .;
            final_enddate = .;
            drop = "yes";
        end;
run;

proc print data=want;
run;
r_behata
Barite | Level 11
data example;
input employee_id $ startdate :DATE8. enddate :DATE8.;
format startdate enddate DATE8.;
datalines;
12345 21JUN20 24JUN20
12345 28JUN20 18JUL20
12345 29JUL20 14AUG20
12345 09NOV20 20NOV20
67890 24APR20 07JUN20
67890 24JUN20 08JUL20
67890 21JUL20 26AUG20
97531 30APR17 05MAY17
97531 01MAY17 05MAY17
;
run;

data want;
	array enddt[999] _temporary_ ;
	array stdt[999] 	_temporary_ ;

	do i=1 by 1 until(last.employee_id);
		set example;
		by employee_id;
		enddt[i]=enddate;
		stdt[i]=startdate;
	end;

	do j=1 by 1 until(last.employee_id);
		set example;
		by employee_id;

		if enddt[j+1]=enddate then do;	
			if (startdate - enddate) < (stdt[j+1] -enddate) then drop='yes';
		end;
		else drop='no';
		output;

		call missing(drop);
	end;

	drop i j;
run;
                                                employee_
                                         Obs       id        startdate     enddate    drop

                                          1       12345       21JUN20      24JUN20    no  
                                          2       12345       28JUN20      18JUL20    no  
                                          3       12345       29JUL20      14AUG20    no  
                                          4       12345       09NOV20      20NOV20    no  
                                          5       67890       24APR20      07JUN20    no  
                                          6       67890       24JUN20      08JUL20    no  
                                          7       67890       21JUL20      26AUG20    no  
                                          8       97531       30APR17      05MAY17    yes 
                                          9       97531       01MAY17      05MAY17    no  
iced_tea
Obsidian | Level 7

Thank you Lapis.

Actually, I made a mistake in my initial question where I should have said that I want to drop the observation for employee_id 97531 where startdate=01MAY17 as opposed to when it =30APR17. I want to keep the longest observation. So, really the desired output looks like the following:

iced_tea_0-1614276088186.png

The code you suggested works with a minor modification to do this:

proc sort data=example
    out=example_sorted;
    by employee_id enddate descending startdate;
run;

data want;
    set example_sorted;
    by employee_id enddate;
    format final_startdate final_enddate date8. drop $8.;
    if last.enddate
        then do;
            final_startdate = startdate;
            final_enddate = enddate;
            drop = "no";
        end;
    else
        do;
            final_startdate = .;
            final_enddate = .;
            drop = "yes";
        end;
run;

I simply replaced "first" with "last". 

Thank you!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 676 views
  • 1 like
  • 3 in conversation