Hi all,
Hi all, I have a longitudinal data set where the above dates are nested by household and person. The above are dates for one person and one household. I would like to compute the difference between two subsequent dates without creating a lagged variable. Is this possible? Thanks and all help is appreciated.
It's easy, but you are missing some of the description. When should the variable not be calculated? When beginning a new household? When beginning a new member of the household?
Here's an approach that assumes you should get no differences whenever the member of the household changes:
proc sort data=have;
by household member date;
run;
data want;
set have;
by household member;
days_difference = dif(date);
if first.member then days_difference=.;
run;
Similar to the LAG function, the DIF function must execute on every observation in order to get the right answer.
Here's more of the data set. I would like to compute the difference for each unique basebrgy (household) and basewman (person) combination. And if there is no date, I would like it to not compute anything.
Then the code I gave you is the right approach. You just need to change the variable names.
Also, DATE cannot be a character string. DATE has to be a numeric variable stored on SAS's usual scale for storing dates.
I have a different question but related to the same dataset. I am trying to create a singular ID variable for each household and person combination. This is the code I have currently that sort of gets at what I am trying to do:
data subset2;
set subset2;
by basebrgy basewman;
ID + 1;
if first.basebrgy OR first.basewman then ID = 1;
run;
Here is what the output looks like:
For the ID variable, I'd like it to be a column of 1's for basebrgy = 1/basewman = 12 and a column of 2's for basebrgy = 1/basewman = 14. Hope I was clear enough. Thanks for any help.
Not sure I understant but wouldn't this simply be:
data subset2;
set subset2;
if basebrgy = 1 and basewman = 12 then ID=1;
else if basebrgy = 1 and basewman = 14 then ID=2;
else;
run;
This works but subset2 is a subset of the much larger dataset. I would like to automate the process instead of doing it manually.
You're playing with the right tools, but you need a slightly different combination:
data want;
set have;
by basebrgy basewman;
if first.basewman then ID + 1;
run;
This is definitely a topic worth spending time on to understand. Many DATA steps create BY variables.
I tried your code but it only produced 1 for the first observations of each combination. I would like 2 for the second combination. Any ideas?
The code worked just fine for me. Can you post the log from your attempt?
1867 data subset2;
1868 set subset2;
1869 by basebrgy basewman;
1870
1871 if first.basewman then ID = 1;
1872 run;
NOTE: There were 18 observations read from the data set WORK.SUBSET2.
NOTE: The data set WORK.SUBSET2 has 18 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Here's the output:
Never mind, I figured it out. I set ID = 1 instead of ID + 1
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.