DATA Step, Macro, Functions and more

Difference between Two dates for Single Variable

Reply
Contributor
Posts: 43

Difference between Two dates for Single Variable

Hi all,

 

 

Screen Shot 2016-11-07 at 11.20.12 AM.png

 

Hi all, I have a longitudinal data set where the above dates are nested by household and person. The above are dates for one person and one household. I would like to compute the difference between two subsequent dates without creating a lagged variable. Is this possible? Thanks and all help is appreciated.

Super User
Posts: 5,082

Re: Difference between Two dates for Single Variable

It's easy, but you are missing some of the description.  When should the variable not be calculated?  When beginning a new household?  When beginning a new member of the household?

 

Here's an approach that assumes you should get no differences whenever the member of the household changes:

 

proc sort data=have;

by household member date;

run;

 

data want;

set have;

by household member;

days_difference = dif(date);

if first.member then days_difference=.;

run;

 

Similar to the LAG function, the DIF function must execute on every observation in order to get the right answer. 

Contributor
Posts: 43

Re: Difference between Two dates for Single Variable

Screen Shot 2016-11-07 at 2.10.26 PM.png

 

Here's more of the data set. I would like to compute the difference for each unique basebrgy (household) and basewman (person) combination. And if there is no date, I would like it to not compute anything.

Super User
Posts: 5,082

Re: Difference between Two dates for Single Variable

Then the code I gave you is the right approach.  You just need to change the variable names.

 

Also, DATE cannot be a character string.  DATE has to be a numeric variable stored on SAS's usual scale for storing dates.

Contributor
Posts: 43

Re: Difference between Two dates for Single Variable

I have a different question but related to the same dataset. I am trying to create a singular ID variable for each household and person combination. This is the code I have currently that sort of gets at what I am trying to do:

 

data subset2;
	set subset2;
	by basebrgy basewman;
	ID + 1;

	if first.basebrgy OR first.basewman then ID = 1;


run;

Here is what the output looks like:

 

Screen Shot 2016-11-09 at 11.39.42 PM.png

 

For the ID variable, I'd like it to be a column of 1's for basebrgy = 1/basewman = 12 and a column of 2's for basebrgy = 1/basewman = 14. Hope I was clear enough. Thanks for any help.

PROC Star
Posts: 551

Re: Difference between Two dates for Single Variable

Not sure I understant but wouldn't this simply be:

 

data subset2;
	set subset2;

   if basebrgy = 1 and basewman = 12 then ID=1;
   else if basebrgy = 1 and basewman = 14 then ID=2;
   else;

run;
Contributor
Posts: 43

Re: Difference between Two dates for Single Variable

This works but subset2 is a subset of the much larger dataset. I would like to automate the process instead of doing it manually.

Super User
Posts: 5,082

Re: Difference between Two dates for Single Variable

You're playing with the right tools, but you need a slightly different combination:

 

data want;

set have;

by basebrgy basewman;

if first.basewman then ID + 1;

run;

 

This is definitely a topic worth spending time on to understand.  Many DATA steps create BY variables.

Contributor
Posts: 43

Re: Difference between Two dates for Single Variable

I tried your code but it only produced 1 for the first observations of each combination. I would like 2 for the second combination. Any ideas?

 

Super User
Posts: 5,082

Re: Difference between Two dates for Single Variable

The code worked just fine for me.  Can you post the log from your attempt?

Contributor
Posts: 43

Re: Difference between Two dates for Single Variable

1867  data subset2;
1868      set subset2;
1869      by basebrgy basewman;
1870
1871      if first.basewman then ID = 1;
1872  run;

NOTE: There were 18 observations read from the data set WORK.SUBSET2.
NOTE: The data set WORK.SUBSET2 has 18 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

Here's the output:

 

Screen Shot 2016-11-10 at 1.11.49 PM.png

Contributor
Posts: 43

Re: Difference between Two dates for Single Variable

Never mind, I figured it out. I set ID = 1 instead of ID + 1

Ask a Question
Discussion stats
  • 11 replies
  • 266 views
  • 0 likes
  • 3 in conversation