11-07-2016 04:53 PM
Hi all, I have a longitudinal data set where the above dates are nested by household and person. The above are dates for one person and one household. I would like to compute the difference between two subsequent dates without creating a lagged variable. Is this possible? Thanks and all help is appreciated.
11-07-2016 05:08 PM
It's easy, but you are missing some of the description. When should the variable not be calculated? When beginning a new household? When beginning a new member of the household?
Here's an approach that assumes you should get no differences whenever the member of the household changes:
proc sort data=have;
by household member date;
by household member;
days_difference = dif(date);
if first.member then days_difference=.;
Similar to the LAG function, the DIF function must execute on every observation in order to get the right answer.
11-07-2016 05:13 PM
Here's more of the data set. I would like to compute the difference for each unique basebrgy (household) and basewman (person) combination. And if there is no date, I would like it to not compute anything.
11-07-2016 05:18 PM
Then the code I gave you is the right approach. You just need to change the variable names.
Also, DATE cannot be a character string. DATE has to be a numeric variable stored on SAS's usual scale for storing dates.
11-10-2016 02:42 AM
I have a different question but related to the same dataset. I am trying to create a singular ID variable for each household and person combination. This is the code I have currently that sort of gets at what I am trying to do:
data subset2; set subset2; by basebrgy basewman; ID + 1; if first.basebrgy OR first.basewman then ID = 1; run;
Here is what the output looks like:
For the ID variable, I'd like it to be a column of 1's for basebrgy = 1/basewman = 12 and a column of 2's for basebrgy = 1/basewman = 14. Hope I was clear enough. Thanks for any help.
11-10-2016 02:49 AM
Not sure I understant but wouldn't this simply be:
data subset2; set subset2; if basebrgy = 1 and basewman = 12 then ID=1; else if basebrgy = 1 and basewman = 14 then ID=2; else; run;
11-10-2016 04:01 PM
This works but subset2 is a subset of the much larger dataset. I would like to automate the process instead of doing it manually.
11-10-2016 06:55 AM
You're playing with the right tools, but you need a slightly different combination:
by basebrgy basewman;
if first.basewman then ID + 1;
This is definitely a topic worth spending time on to understand. Many DATA steps create BY variables.
11-10-2016 04:00 PM
I tried your code but it only produced 1 for the first observations of each combination. I would like 2 for the second combination. Any ideas?
11-10-2016 04:15 PM
1867 data subset2;
1868 set subset2;
1869 by basebrgy basewman;
1871 if first.basewman then ID = 1;
NOTE: There were 18 observations read from the data set WORK.SUBSET2.
NOTE: The data set WORK.SUBSET2 has 18 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Here's the output: