BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lalaktgrau
Fluorite | Level 6

Hi all,

 

I have longitudinal data (one obs per person per minute) with the variables: ID, DateTime, and Disease. Disease is binary.

How can I identify the first occurrence of disease in this longitudinal data?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
  set class;
  by name;
  flag= (first.name=1 and disease=1) or (dif(disease)=1);
run;

 

The DIF function is defined as   dif(x)=x-lag(x).  So the condition "dif(disease)=1" picks up all instances of disease=1 that immediately follow a disease=0.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Plz post samples of your input and your required output?

 

Sure some genuises may be gauge from sentence based description but samples help test the solutions from responder's end 

lalaktgrau
Fluorite | Level 6

data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 1 04MAR14:23:59:00
Alfred 1 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00

mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 1 04MAR14:01:01:00
mary 1 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

 

I'd like a variable that =1 just on the FIRST time disease=1. For example, the variable would =1 for Alfred at '04MAR14:23:57:00' and =1 for Mary at ' 04MAR14:00:00:00'

novinosrin
Tourmaline | Level 20
 
data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 1 04MAR14:23:59:00
Alfred 1 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 1 04MAR14:01:01:00
mary 1 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
set class;
by name;
if first.name then temp=1;
temp+disease;
if temp=2;
drop temp;
run;
lalaktgrau
Fluorite | Level 6

This works... but what if a person can fall in and out of disease? How can I identify each individual onset?

novinosrin
Tourmaline | Level 20

Can you clarify that point and post your required output plz

lalaktgrau
Fluorite | Level 6

data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00

mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

 

 

If we wanted the variable to =1 each time there is a new onset of disease? Changing from 0 to 1?

novinosrin
Tourmaline | Level 20

Do you mean you want to pick all the 1's? 

 

An output sample for the input would help

lalaktgrau
Fluorite | Level 6

I would want a variable to equal 1 whenever there is a change from 0 to 1.

novinosrin
Tourmaline | Level 20

may be this?

data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
set class;
by name disease notsorted;
if first.disease and disease then flag=1;
run;
mkeintz
PROC Star
data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
  set class;
  by name;
  flag= (first.name=1 and disease=1) or (dif(disease)=1);
run;

 

The DIF function is defined as   dif(x)=x-lag(x).  So the condition "dif(disease)=1" picks up all instances of disease=1 that immediately follow a disease=0.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1500 views
  • 0 likes
  • 3 in conversation