Pyrite | Level 9

## Determine Date Difference Between Dates Over Groups

Good evening. I hope there's someone who could give me a suggestion on how to solve this problem.

I have a dataset with many observations over separate groups. What I'm looking for is the difference between the first start date of the group and the ending date by the group. The twist is that the group is not limited to one grouping by variable. The groups are defined by five different variables. I only need the number of days on the final observation of each group but by all five by variables. What I need at the end of the day is one instance of each of all five variables with the number of days from the first date to the last date.

I know it sounds strange. Any alternative suggestions would be greatly appreciated.

Please see the attached Excel sheet for my (probably poor) attempt to explain this and below for my latest attempt to use  LAG. I've also tried RETAIN but can't seem to get either correct.

Thank you.

data attemp01;
set XX;
format lag_housing_start_date lag_housing_stop_date mmddyy10.;

by var1 var2 var3 var4 var5 housing_start_date housing_stop_date;

lag_var4 = lag(var4);
lag_var5 = lag(var5);
lag_housing_start_date = lag(housing_start_date);
lag_housing_stop_date = lag(housing_stop_date);

if (first.var1
and first.var2
and first.var3
and first.var4
and first.var5) then do;

lag_var4 = var4;
lag_var5 = var5;
lag_housing_start_date = .;
lag_housing_stop_date = .;
end;
else do;
lag_var4 = var4;
lag_var5 = var5;
days_from_previous = housing_start_date - lag_housing_stop_date;
end;

run;

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Determine Date Difference Between Dates Over Groups

I won't open any office files, but fortunately @r_behata posted the data in usable form. Another problem are the names of the variables: var1-var5 are hardly understandable and make code less readable.

You don't have to check all key-variables, just the last one:

``````/* to avoid error messages if the data is not sorted */
proc sort data=have out=sorted presorted;
by var1-var5 housing_start_date;
run;

data want;
set sorted;
by var1-var5;

length first_start_date 8 Difference 8;
format first_start_date date9.;
retain first_start_date;

if first.var5 then do;
first_start_date = housing_start_date;
end;

if last.var5 then do;
Difference = intck('DAY', first_start_date, housing_stop_date);
output;
end;
run;``````

The result differs from the solution posted by @r_behata, i changed the format used to read the dates from anydtdte to mmddyy10.

2 REPLIES 2
Barite | Level 11

## Re: Determine Date Difference Between Dates Over Groups

Here is the Proc SQL Solution for it . Hope this helps :

``````
data have;

input VAR1 \$	VAR2 \$	VAR3 \$	VAR4	\$ VAR5 \$	DATE1:anydtdte.	DATE2:anydtdte.;
format date1 date2 mmddyys10.;
infile cards truncover;
cards;
PERSON1 1 3 TYPE1 TYPE11 8/28/2012 2/23/2013
PERSON1 1 3 TYPE1 TYPE11 3/13/2013 3/16/2013
PERSON1 1 3 TYPE1 TYPE11 3/21/2013 4/12/2013
PERSON1 1 3 TYPE1 TYPE11 4/17/2013 10/22/2013
PERSON1 1 3 TYPE1 TYPE11 11/10/2013 11/26/2013
PERSON1 2 1 TYPE1 TYPE11 11/2/2019 11/18/2019
PERSON2 1 1 TYPE1 TYPE11 1/23/2010 7/21/2010
PERSON2 1 1 TYPE1 TYPE11 7/28/2010 8/12/2010
PERSON2 1 1 TYPE1 TYPE11 8/30/2010 9/28/2010
PERSON2 1 1 TYPE1 TYPE11 5/31/2014 11/14/2014
PERSON2 1 1 TYPE2 TYPE21 7/21/2010 10/8/2010
PERSON3 1 1 TYPE1 TYPE11 10/7/2020 10/15/2020
PERSON3 2 1 TYPE1 TYPE11 5/12/2015 5/21/2015
PERSON3 2 1 TYPE1 TYPE11 10/5/2016 11/14/2016
PERSON3 2 1 TYPE1 TYPE11 2/15/2017 3/7/2017
PERSON3 2 1 TYPE1 TYPE11 5/3/2017 10/30/2017
PERSON3 2 1 TYPE1 TYPE11 11/11/2017 12/1/2017
;
run;

Proc SQl;
Create table want as
select VAR1,	VAR2,	VAR3,	VAR4,	VAR5,
min(date1) as FIRST_DATE format=mmddyys10.,
max(date2) as LAST_DATE format=mmddyys10.,
max(date2)-min(date1) as DIFFERENCE
from have
group by 1,2,3,4,5;

Quit;``````

## Re: Determine Date Difference Between Dates Over Groups

I won't open any office files, but fortunately @r_behata posted the data in usable form. Another problem are the names of the variables: var1-var5 are hardly understandable and make code less readable.

You don't have to check all key-variables, just the last one:

``````/* to avoid error messages if the data is not sorted */
proc sort data=have out=sorted presorted;
by var1-var5 housing_start_date;
run;

data want;
set sorted;
by var1-var5;

length first_start_date 8 Difference 8;
format first_start_date date9.;
retain first_start_date;

if first.var5 then do;
first_start_date = housing_start_date;
end;

if last.var5 then do;
Difference = intck('DAY', first_start_date, housing_stop_date);
output;
end;
run;``````

The result differs from the solution posted by @r_behata, i changed the format used to read the dates from anydtdte to mmddyy10.

Discussion stats
• 2 replies
• 1081 views
• 0 likes
• 3 in conversation