Help using Base SAS procedures

How to indicate duplicates in long format data set

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

How to indicate duplicates in long format data set

Hi! I have a question regarding how to indicate a value is repeated in a data that's long format.

Example:

ID name_of_former_variable ageofonset numberoftraumas

1   ageonsettrauma14               12             1

1   ageonsettrauma7                 13             2

2   ageonsettrauma6                 8               1

2   ageonsettrauma7                 8               2

2   ageonsettrauma16               8               3

2   ageonsettrauma18               8               4

2   ageonsettrauma20               9               5

3   ageonsettrauma4                 5               1

3   ageonsettrauma18               5                2

So basically, numberoftraumas is a counter variable to see the number of traumas each ID endorsed.

I am interested in earliest age of onset (so for ID # 1, it is 12, for ID #2 it is 8). I also want to know

the number of traumas endorsed (So for ID #1 it is 2, for ID #2 it is 5). I can get these by splitting these

into two data sets and merging the largest "numberoftraumas" variable with the smallest "ageofonset"

THen I was able to use a substring function to indicate which trauma # is associated with the earliest

age of onset. But I'd like to indicate if there is a tie. So for example, for ID #2, the earliest age of

onset is 8. However, there are multiple traumas endorsed at age 8. I want to create a variable which

indicates if there are ties, specifically, the first two traumas (because I want to indicate that they had

multiple traumas at that earliest age of onset, if that makes sense).

I can't just use the largest "numberoftraumas' to indicate this because, as you can see with ID #1,

even though they had two traumas, they occurred at two different ages.

So if subject has multiple traumas occurring at the earliest age of trauma, then their 'firsttrauma'=99

Otherwise, it is the number of the trauma.

So for ID1, their earliest age of trauma was trauma#14 and occured at age 12.

No other trauma was endorsed at age 12, so their firsttrauma variable is 14.

But for ID2,  they had 4 traumas at age 8, and one trauma at age 9, so

I don't care which traumas they experienced at age 8, I just want to indicate

that there was more than one experienced at age 8, so their firsttrauma=99.

ID3 had 2 traumas at age 5, so their firsttrauma=99.

So this is what I will see:

ID name_of_former_variable ageofonset numberoftraumas  firsttrauma

1   ageonsettrauma14               12             1                          14

2   ageonsettrauma20               8               5                          99

3   ageonsettrauma18                 5               2                          99

But ultimately, I don't care about the name_of_former_variable so I'd delete that and want to see this:

ID ageofonset numberoftraumas  firsttrauma

      12             1                          14

2          8               5                         99

3          5               2                         99

Thank you so much!! Sorry for the long explanation. I wasn't sure how to find this online , I couldn't find anything in my searches Smiley Sad

Last time I asked a question someone requested for more details so I thought I'd provide more. Smiley Happy


Accepted Solutions
Solution
‎03-11-2014 08:59 PM
Super User
Posts: 17,868

Re: How to indicate duplicates in long format data set

I'd guess there was an easier way to do this with the original data or by merging it in, but I'll provide a data step solution based on your data.

My conversion to number may be off, I always get put/input wrong.

Untested.

data want;

set have;

by id;

retain first_age firsttrauma;

if first.id then do;

      firsttrauma=input(compress(name, 'kd'), 2.);

     first_age=ageofonset;

end;

if not first.id and ageofonset=first_age then firsttrauma=99;

if last.id then output;

end;

View solution in original post


All Replies
Solution
‎03-11-2014 08:59 PM
Super User
Posts: 17,868

Re: How to indicate duplicates in long format data set

I'd guess there was an easier way to do this with the original data or by merging it in, but I'll provide a data step solution based on your data.

My conversion to number may be off, I always get put/input wrong.

Untested.

data want;

set have;

by id;

retain first_age firsttrauma;

if first.id then do;

      firsttrauma=input(compress(name, 'kd'), 2.);

     first_age=ageofonset;

end;

if not first.id and ageofonset=first_age then firsttrauma=99;

if last.id then output;

end;

Contributor
Posts: 68

Re: How to indicate duplicates in long format data set

Hi Reeza!

Thank you so much. The code worked Smiley Happy I just had to change input to put. You always are so fast to respond and help me, which I really appreciate. I am, unfortunately, not the best programmer, but try to google things before I come here and ask you guys. If you have any advice on how I can learn little tricks like the one you showed me, I'd really appreciate it! Just simple things like using the retain, first.variable types of statements.

Thanks again!!!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 216 views
  • 1 like
  • 2 in conversation