BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Chudamani
Obsidian | Level 7

I am using the Health and Retirement Study (HRS) to conduct a longitudinal analysis to study the cognitive trajectories of those with dementia at my baseline. HRS is a biannual survey. My dataset is in wide form and looks like the one below (for e.g.). The variable "DGXX" represents the dementia diagnosis (1=dementia, 0=no dementia) for each respective survey year "INXX" represents the year when the survey was completed. For example. Person 706 had dementia already in 1992, 707 was only diagnosed in 1998. In my study, I want to control for the number of years the person has already lived with dementia before baseline (2004). Now, I want to create a variable that tells me when the person was diagnosed with dementia for the first time. Any help would be much appreciated. Thanks.

 

ID

DG92

DG94

DG96

DG98

DG00

DG02

DG04

IN92

IN94

IN96

IN98

 IN00

IN02

IN04

706

1

1

1

1

1

1

1

92

94

97

98

00

03

04

707

0

0

0

1

1

1

1

93

94

97

98

00

02

04

708

0

0

0

0

0

0

1

93

95

97

98

00

02

05

709

1

1

1

1

.

.

1

92

94

96

98

00

02

04

710

0

0

0

0

1

1

1

92

94

96

98

00

02

04

711

0

1

0

1

1

1

1

92

94

96

98

00

02

04

712

0

0

0

0

0

1

1

92

94

96

98

00

02

04

1 ACCEPTED SOLUTION

Accepted Solutions
MarkusWeick
Barite | Level 11

Hi @Chudamani,

I guess the simplest approach would be a nested if-loop for filling the new variable in a data step like:

 

IF DG92 = 1 then new-variable = 1992

ELSE

  IF DG94 = 1 then new_variable = 1994

     ...

           IF DG04 = 1 then new_variable = 2004

 

Best

Markus

ELSE new_variable = 9999

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles

View solution in original post

6 REPLIES 6
MarkusWeick
Barite | Level 11

Hi @Chudamani,

I guess the simplest approach would be a nested if-loop for filling the new variable in a data step like:

 

IF DG92 = 1 then new-variable = 1992

ELSE

  IF DG94 = 1 then new_variable = 1994

     ...

           IF DG04 = 1 then new_variable = 2004

 

Best

Markus

ELSE new_variable = 9999

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
ballardw
Super User

If you made that wide data set then you may have complicated the process.

You have definitely complicated the process by using 2 digit years. Hint: if must place information like a year into a variable name then at least use a full 4-digit year. Any variable that supposedly has a "year" value really should have 4 digits.

Reasons include things like sort order: 00 comes way before 92 for example.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

If you can't do that then provide data in the form of simple text by pasting into a text window opened on the forum with </>.

That form of data "table" is obnoxious when copied and pasted into an editor to try to code.

 

The WHICHN function is your answer. The function Whichn, and corresponding WhichC for character variables, looks for the value of the first parameter in order of the value list following. The function returns the position number if found or zero if not found.

 

data want;
   set have;
   if whichn(1,dg92,dg94,dg96,dg98,dg00,dg02,dg04)>0
      then firstyear = 1990 + (2* whichn(1,dg92,dg94,dg96,dg98,dg00,dg02,dg04));
run;

IF your In variables have an irregular increment at were actually needed you could use an array to pull the value

data want;
   set have;
   array y (*)  in92 in94 in96 in98 in00 in02 in04;
   if whichn(1,dg92,dg94,dg96,dg98,dg00,dg02,dg04)>0
      then firstyear = y[ whichn(1,dg92,dg94,dg96,dg98,dg00,dg02,dg04)];
run;

You did not provide any example of what you expect for the year when there is no 1 in the DG variables.

Chudamani
Obsidian | Level 7
Thanks, @ballardw. Thanks for your response on this. I am sorry about not posting the data correctly. I will be careful with that next time.
mkeintz
PROC Star

First, why do you have the IN variables, which seem to be columns of constant values?  They provide no information to discriminate between one observation and another.

 

Along the lines of what @ballardw suggested, you can create a temporary array of those constant values, and use that array in tandem with the whichn function, as below (untested in the absence of sample data in the form of a working data step):

 

data want;
   set have;
   array dg     {*}                   dg92 dg94 dg96 dg98 dg00 dg02 dg04  ;
   array yrvals {0:7} _temporary_ (., 1992,1994,1996,1998,2000,2002,2004) ;
   firstyear=yrvals{whichn(1,of dg{*})};
run;

Note if there is no value of 1 in any of the DG variables, then the whichn function returns a zero.  The zero'th element of the yrvals array is a missing value.  That's because, instead of a default lower bound of one, the yrvals array was defined to have a lower bound of zero, 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Chudamani
Obsidian | Level 7
Thank you, @mkeintz. IN variables are the dates when the interviews were conducted for the person in the survey. I need this info to create other variables.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1736 views
  • 2 likes
  • 4 in conversation