BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lalaktgrau
Fluorite | Level 6

Hi all,

 

I have longitudinal data (one obs per person per minute) with the variables: ID, DateTime, and Disease. Disease is binary.

How can I identify the first occurrence of disease in this longitudinal data?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
  set class;
  by name;
  flag= (first.name=1 and disease=1) or (dif(disease)=1);
run;

 

The DIF function is defined as   dif(x)=x-lag(x).  So the condition "dif(disease)=1" picks up all instances of disease=1 that immediately follow a disease=0.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Plz post samples of your input and your required output?

 

Sure some genuises may be gauge from sentence based description but samples help test the solutions from responder's end 

lalaktgrau
Fluorite | Level 6

data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 1 04MAR14:23:59:00
Alfred 1 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00

mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 1 04MAR14:01:01:00
mary 1 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

 

I'd like a variable that =1 just on the FIRST time disease=1. For example, the variable would =1 for Alfred at '04MAR14:23:57:00' and =1 for Mary at ' 04MAR14:00:00:00'

novinosrin
Tourmaline | Level 20
 
data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 1 04MAR14:23:59:00
Alfred 1 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 1 04MAR14:01:01:00
mary 1 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
set class;
by name;
if first.name then temp=1;
temp+disease;
if temp=2;
drop temp;
run;
lalaktgrau
Fluorite | Level 6

This works... but what if a person can fall in and out of disease? How can I identify each individual onset?

novinosrin
Tourmaline | Level 20

Can you clarify that point and post your required output plz

lalaktgrau
Fluorite | Level 6

data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00

mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

 

 

If we wanted the variable to =1 each time there is a new onset of disease? Changing from 0 to 1?

novinosrin
Tourmaline | Level 20

Do you mean you want to pick all the 1's? 

 

An output sample for the input would help

lalaktgrau
Fluorite | Level 6

I would want a variable to equal 1 whenever there is a change from 0 to 1.

novinosrin
Tourmaline | Level 20

may be this?

data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
set class;
by name disease notsorted;
if first.disease and disease then flag=1;
run;
mkeintz
PROC Star
data WORK.CLASS;
infile datalines truncover;
input Name:$8. disease datetime;
informat datetime datetime20.;
format datetime datetime20.;
datalines;
Alfred 0 04MAR14:23:55:00
Alfred 0 04MAR14:23:56:00
Alfred 1 04MAR14:23:57:00
Alfred 1 04MAR14:23:58:00
Alfred 0 04MAR14:23:59:00
Alfred 0 04MAR14:00:00:00
Alfred 1 04MAR14:01:01:00
Alfred 1 04MAR14:01:02:00
Alfred 1 04MAR14:01:03:00
Alfred 1 04MAR14:01:04:00
mary 0 04MAR14:23:55:00
mary 0 04MAR14:23:56:00
mary 0 04MAR14:23:57:00
mary 0 04MAR14:23:58:00
mary 0 04MAR14:23:59:00
mary 1 04MAR14:00:00:00
mary 0 04MAR14:01:01:00
mary 0 04MAR14:01:02:00
mary 1 04MAR14:01:03:00
mary 1 04MAR14:01:04:00
;

data want;
  set class;
  by name;
  flag= (first.name=1 and disease=1) or (dif(disease)=1);
run;

 

The DIF function is defined as   dif(x)=x-lag(x).  So the condition "dif(disease)=1" picks up all instances of disease=1 that immediately follow a disease=0.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1048 views
  • 0 likes
  • 3 in conversation