BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jeff_DOC
Pyrite | Level 9

Good evening. I hope there's someone who could give me a suggestion on how to solve this problem.

 

I have a dataset with many observations over separate groups. What I'm looking for is the difference between the first start date of the group and the ending date by the group. The twist is that the group is not limited to one grouping by variable. The groups are defined by five different variables. I only need the number of days on the final observation of each group but by all five by variables. What I need at the end of the day is one instance of each of all five variables with the number of days from the first date to the last date.

 

I know it sounds strange. Any alternative suggestions would be greatly appreciated.

 

Please see the attached Excel sheet for my (probably poor) attempt to explain this and below for my latest attempt to use  LAG. I've also tried RETAIN but can't seem to get either correct.

 

Thank you.

 

data attemp01;
set XX;
format lag_housing_start_date lag_housing_stop_date mmddyy10.;

by var1 var2 var3 var4 var5 housing_start_date housing_stop_date;

lag_var4 = lag(var4);
lag_var5 = lag(var5);
lag_housing_start_date = lag(housing_start_date);
lag_housing_stop_date = lag(housing_stop_date);

if (first.var1
and first.var2
and first.var3
and first.var4
and first.var5) then do;

lag_var4 = var4;
lag_var5 = var5;
lag_housing_start_date = .;
lag_housing_stop_date = .;
end;
else do;
lag_var4 = var4;
lag_var5 = var5;
days_from_previous = housing_start_date - lag_housing_stop_date;
end;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

I won't open any office files, but fortunately @r_behata posted the data in usable form. Another problem are the names of the variables: var1-var5 are hardly understandable and make code less readable.

You don't have to check all key-variables, just the last one:

/* to avoid error messages if the data is not sorted */
proc sort data=have out=sorted presorted;
	by var1-var5 housing_start_date;
run;

data want;
	set sorted;
	by var1-var5;
	
	length first_start_date 8 Difference 8;
	format first_start_date date9.;
	retain first_start_date;
	
	if first.var5 then do;
		first_start_date = housing_start_date;
	end;
	
	if last.var5 then do;
		Difference = intck('DAY', first_start_date, housing_stop_date);
		output;
	end;
run;

The result differs from the solution posted by @r_behata, i changed the format used to read the dates from anydtdte to mmddyy10.

View solution in original post

2 REPLIES 2
r_behata
Barite | Level 11

Here is the Proc SQL Solution for it . Hope this helps :

 


data have;

input VAR1 $	VAR2 $	VAR3 $	VAR4	$ VAR5 $	DATE1:anydtdte.	DATE2:anydtdte.;
format date1 date2 mmddyys10.;
infile cards truncover;
cards;
PERSON1 1 3 TYPE1 TYPE11 8/28/2012 2/23/2013
PERSON1 1 3 TYPE1 TYPE11 3/13/2013 3/16/2013
PERSON1 1 3 TYPE1 TYPE11 3/21/2013 4/12/2013
PERSON1 1 3 TYPE1 TYPE11 4/17/2013 10/22/2013
PERSON1 1 3 TYPE1 TYPE11 11/10/2013 11/26/2013
PERSON1 2 1 TYPE1 TYPE11 11/2/2019 11/18/2019
PERSON2 1 1 TYPE1 TYPE11 1/23/2010 7/21/2010
PERSON2 1 1 TYPE1 TYPE11 7/28/2010 8/12/2010
PERSON2 1 1 TYPE1 TYPE11 8/30/2010 9/28/2010
PERSON2 1 1 TYPE1 TYPE11 5/31/2014 11/14/2014
PERSON2 1 1 TYPE2 TYPE21 7/21/2010 10/8/2010
PERSON3 1 1 TYPE1 TYPE11 10/7/2020 10/15/2020
PERSON3 2 1 TYPE1 TYPE11 5/12/2015 5/21/2015
PERSON3 2 1 TYPE1 TYPE11 10/5/2016 11/14/2016
PERSON3 2 1 TYPE1 TYPE11 2/15/2017 3/7/2017
PERSON3 2 1 TYPE1 TYPE11 5/3/2017 10/30/2017
PERSON3 2 1 TYPE1 TYPE11 11/11/2017 12/1/2017
;
run;


Proc SQl;
Create table want as
select VAR1,	VAR2,	VAR3,	VAR4,	VAR5,
min(date1) as FIRST_DATE format=mmddyys10.,
max(date2) as LAST_DATE format=mmddyys10.,
max(date2)-min(date1) as DIFFERENCE
from have
group by 1,2,3,4,5;

Quit;
andreas_lds
Jade | Level 19

I won't open any office files, but fortunately @r_behata posted the data in usable form. Another problem are the names of the variables: var1-var5 are hardly understandable and make code less readable.

You don't have to check all key-variables, just the last one:

/* to avoid error messages if the data is not sorted */
proc sort data=have out=sorted presorted;
	by var1-var5 housing_start_date;
run;

data want;
	set sorted;
	by var1-var5;
	
	length first_start_date 8 Difference 8;
	format first_start_date date9.;
	retain first_start_date;
	
	if first.var5 then do;
		first_start_date = housing_start_date;
	end;
	
	if last.var5 then do;
		Difference = intck('DAY', first_start_date, housing_stop_date);
		output;
	end;
run;

The result differs from the solution posted by @r_behata, i changed the format used to read the dates from anydtdte to mmddyy10.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1006 views
  • 0 likes
  • 3 in conversation