BookmarkSubscribeRSS Feed
MFraga
Quartz | Level 8

Hi,

I am working with a longitudinal database in wide format and I am looking for a way to code a "time since the first time an individual had a confirmed observation (tso)" variable instead of "time since arrival year (tsa)". The observation period in my dataset ranges from 1990 to 1999, and the individuals come into observation at different times.

 

After verification, I noticed that some individuals in my dataset are active in years prior to the official registered year of arrival (that is, they do not have an arrival year well reported). For those individuals who have a year of arrival recorded as prior to 1990 (the year of the start of the data collection), I need each year before 1990 to be counted as a year with presence of the individual. Thus, I need to correct the time variable chronologically for the years I have a confirmed observation for each individual where:

 

"id" is the individual identification number variable;

 

"arrival" variable represents the officially registered year of arrival;

 

"presence1990" to "presence1999" that have the value of "0" for not present and "1" for a confirmed presence for each year;

 

 

In addition, I need to create this variable:

"type" that represents the type of behaviour of the individuals and is coded as: “0” for those individuals that have a confirmed observation from the year of arrival until the right censorship; “1” for those individuals who left for 4 or more consecutive years since the first presence was observed and ARE NOT present in any of the subsequent years; “2” for those individuals who left for 4 or more consecutive years since the first presence was observed and ARE present in any of the subsequent years until right censorship; “3” for those individuals that left for 3 years or less and DO NOT have a confirmed observation in the last year under analysis (1999, the right censure year); “4” for those individuals that left for 3 years or less and DO have a confirmed observation in 1999;

 

The table that I “have” so far is like this:

 

data have;

input

id arrival presence1990 presence1991 presence1992 presence1993 presence1994 presence1995 presence1996 presence1997 presence1998 presence1999 tsa1990 tsa1991 tsa1992 tsa1993 tsa1994 tsa1995 tsa1996 tsa1997 tsa1998 tsa1999;

datalines;

1 1990 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9

2 1997 0 0 0 0 0 0 0 1 1 1 . . . . . . . 0 1 2

3 1993 0 0 0 1 1 1 1 0 0 0 . . . 0 1 2 3 . . .

4 1999 0 0 0 0 0 0 0 0 0 1 . . . . . . . . . 0

5 1998 0 0 0 0 0 0 0 0 1 0 . . . . . . . . 0 .

6 1990 1 1 1 0 1 1 1 1 1 0 0 1 2 . 4 5 6 7 8 .

7 1994 0 0 0 0 1 1 0 0 0 1 . . . . 0 1 . . . 5

8 1992 0 0 1 1 0 1 0 1 0 0 . . 0 1 . 3 . 5 . .

9 1991 0 1 0 0 0 0 1 1 0 0 . 0 . . . . 5 6 . .

10 1983 1 1 1 1 1 1 1 1 1 1 7 8 9 10 11 12 13 14 15 16

11 1986 1 1 0 0 0 0 0 0 0 0 4 5 . . . . . . . .

12 1979 1 0 0 0 0 1 1 1 1 1 11 . . . . 16 17 18 19 20

13 1981 0 0 1 1 1 1 1 1 1 1 . . 12 13 14 15 16 17 18 19

14 1988 0 1 1 0 1 0 1 1 1 0 . 3 4 . 6 . 8 9 10 .

15 1993 1 1 0 1 1 1 1 1 1 1 . . . 0 1 2 3 4 5 6

16 1997 0 0 1 1 1 0 0 1 1 0 . . . . . . . 0 1 .

17 1998 1 1 0 0 0 0 0 0 1 1 . . . . . . . . 0 1

18 1998 0 0 0 0 0 1 1 1 1 1 . . . . . 0 1 2 3 4

;

run;

 

 

The table that I “want” is like this:

 

data want;

input

id

arrival

presence1990 presence1991 presence1992 presence1993 presence1994 presence1995 presence1996 presence1997 presence1998 presence1999

tso1990 tso1991 tso1992 tso1993 tso1994 tso1995 tso1996 tso1997 tso1998 tso1999

type

;

 

datalines;

1 1990 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0

2 1997 0 0 0 0 0 0 0 1 1 1 . . . . . . . 0 1 2 0

3 1993 0 0 0 1 1 1 1 0 0 0 . . . 0 1 2 3 . . . 3

4 1999 0 0 0 0 0 0 0 0 0 1 . . . . . . . . . 0 0

5 1998 0 0 0 0 0 0 0 0 1 0 . . . . . . . . 0 . 3

6 1990 1 1 1 0 1 1 1 1 1 0 0 1 2 . 3 4 5 6 7 . 3

7 1994 0 0 0 0 1 1 0 0 0 1 . . . . 0 1 . . . 2 4

8 1992 0 0 1 1 0 1 0 1 0 0 . . 0 1 . 2 . 3 . . 3

9 1991 0 1 0 0 0 0 1 1 0 0 . 0 . . . . 1 2 . . 2

10 1983 1 1 1 1 1 1 1 1 1 1 7 8 9 10 11 12 13 14 15 16 0

11 1986 1 1 0 0 0 0 0 0 0 0 4 5 . . . . . . . . 1

12 1979 1 0 0 0 0 1 1 1 1 1 11 . . . . 12 13 14 15 16 2

13 1981 0 0 1 1 1 1 1 1 1 1 . . 9 10 11 12 13 14 15 16 4

14 1988 0 1 1 0 1 0 1 1 1 0 . 2 3 . 4 . 5 6 7 . 3

15 1993 1 1 0 1 1 1 1 1 1 1 0 1 . 2 3 4 5 6 7 8 4

16 1997 0 0 1 1 1 0 0 1 1 0 . . 0 1 2 . . 3 4 . 3

17 1998 1 1 0 0 0 0 0 0 1 1 0 1 . . . . . . 2 3 2

18 1998 0 0 0 0 0 1 1 1 1 1 . . . . . 0 1 2 3 4 0

 

;

run;

 

 

Does anyone have any idea how I should proceed? Is my question clear enough?

 

Best regards.

1 REPLY 1
MFraga
Quartz | Level 8

So far, I have tried to deal with this like that:


data tried; set have;

 

if arrival > 1990 and presence1990 = 1 then tfo1990 = 0 and tfo1991 = 1 and tfo1992 = 2 and tfo1993 = 3 and tf01994 = 4 and tfo1995 = 5 and tfo1996 = 6 and tfo1997 = 7 and tfo1998 = 8 and tfo1999 = 9;
if arrival > 1991 and presence1991 = 1 then tfo1991 = 0 and tfo1992 = 1 and tfo1993 = 2 and tf01994 = 3 and tfo1995 = 4 and tfo1996 = 5 and tfo1997 = 6 and tfo1998 = 7 and tfo1999 = 8;
if arrival > 1992 and presence1992 = 1 then tfo1992 = 0 and tfo1993 = 1 and tf01994 = 2 and tfo1995 = 3 and tfo1996 = 4 and tfo1997 = 5 and tfo1998 = 6 and tfo1999 = 7;
if arrival > 1993 and presence1993 = 1 then tfo1993 = 0 and tf01994 = 1 and tfo1995 = 2 and tfo1996 = 3 and tfo1997 = 4 and tfo1998 = 5 and tfo1999 = 6;
if arrival > 1994 and presence1994 = 1 then tfo1994 = 0 and tfo1995 = 1 and tfo1996 = 2 and tfo1997 = 3 and tfo1998 = 4 and tfo1999 = 5;
if arrival > 1995 and presence1995 = 1 then tfo1995 = 0 and tfo1996 = 1 and tfo1997 = 2 and tfo1998 = 3 and tfo1999 = 4;
if arrival > 1996 and presence1996 = 1 then tfo1996 = 0 and tfo1997 = 1 and tfo1998 = 2 and tfo1999 = 3;
if arrival > 1997 and presence1997 = 1 then tfo1997 = 0 and tfo1998 = 1 and tfo1999 = 2;
if arrival > 1998 and presence1998 = 1 then tfo1998 = 0 and tfo1999 = 1;
if arrival > 1999 and presence1999 = 1 then tfo1999 = 0 ;

if arrival < 1990 and presence1990 = 0 then tfo1990 = 1989 - arrival + 0;
if arrival < 1990 and presence1990 = 1 then tfo1990 = 1989 - arrival + 1;
if arrival < 1990 then tfo1991 = tfo1990 +1;
if arrival < 1990 then tfo1992 = tfo1990 +2;
if arrival < 1990 then tfo1993 = tfo1990 +3;
if arrival < 1990 then tfo1994 = tfo1990 +4;
if arrival < 1990 then tfo1995 = tfo1990 +5;
if arrival < 1990 then tfo1996 = tfo1990 +6;
if arrival < 1990 then tfo1997 = tfo1990 +7;
if arrival < 1990 then tfo1998 = tfo1990 +8;
if arrival < 1990 then tfo1999 = tfo1990 +9;

run;

 

Unfortunately, I don't know how to skip the years with missing observation and I also don't know how to correct the number of the observation following a missing observation year.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 315 views
  • 0 likes
  • 1 in conversation