BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ginak
Quartz | Level 8

Hi! I have a question regarding how to indicate a value is repeated in a data that's long format.

Example:

ID name_of_former_variable ageofonset numberoftraumas

1   ageonsettrauma14               12             1

1   ageonsettrauma7                 13             2

2   ageonsettrauma6                 8               1

2   ageonsettrauma7                 8               2

2   ageonsettrauma16               8               3

2   ageonsettrauma18               8               4

2   ageonsettrauma20               9               5

3   ageonsettrauma4                 5               1

3   ageonsettrauma18               5                2

So basically, numberoftraumas is a counter variable to see the number of traumas each ID endorsed.

I am interested in earliest age of onset (so for ID # 1, it is 12, for ID #2 it is 8). I also want to know

the number of traumas endorsed (So for ID #1 it is 2, for ID #2 it is 5). I can get these by splitting these

into two data sets and merging the largest "numberoftraumas" variable with the smallest "ageofonset"

THen I was able to use a substring function to indicate which trauma # is associated with the earliest

age of onset. But I'd like to indicate if there is a tie. So for example, for ID #2, the earliest age of

onset is 8. However, there are multiple traumas endorsed at age 8. I want to create a variable which

indicates if there are ties, specifically, the first two traumas (because I want to indicate that they had

multiple traumas at that earliest age of onset, if that makes sense).

I can't just use the largest "numberoftraumas' to indicate this because, as you can see with ID #1,

even though they had two traumas, they occurred at two different ages.

So if subject has multiple traumas occurring at the earliest age of trauma, then their 'firsttrauma'=99

Otherwise, it is the number of the trauma.

So for ID1, their earliest age of trauma was trauma#14 and occured at age 12.

No other trauma was endorsed at age 12, so their firsttrauma variable is 14.

But for ID2,  they had 4 traumas at age 8, and one trauma at age 9, so

I don't care which traumas they experienced at age 8, I just want to indicate

that there was more than one experienced at age 8, so their firsttrauma=99.

ID3 had 2 traumas at age 5, so their firsttrauma=99.

So this is what I will see:

ID name_of_former_variable ageofonset numberoftraumas  firsttrauma

1   ageonsettrauma14               12             1                          14

2   ageonsettrauma20               8               5                          99

3   ageonsettrauma18                 5               2                          99

But ultimately, I don't care about the name_of_former_variable so I'd delete that and want to see this:

ID ageofonset numberoftraumas  firsttrauma

      12             1                          14

2          8               5                         99

3          5               2                         99

Thank you so much!! Sorry for the long explanation. I wasn't sure how to find this online , I couldn't find anything in my searches Smiley Sad

Last time I asked a question someone requested for more details so I thought I'd provide more. Smiley Happy

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I'd guess there was an easier way to do this with the original data or by merging it in, but I'll provide a data step solution based on your data.

My conversion to number may be off, I always get put/input wrong.

Untested.

data want;

set have;

by id;

retain first_age firsttrauma;

if first.id then do;

      firsttrauma=input(compress(name, 'kd'), 2.);

     first_age=ageofonset;

end;

if not first.id and ageofonset=first_age then firsttrauma=99;

if last.id then output;

end;

View solution in original post

2 REPLIES 2
Reeza
Super User

I'd guess there was an easier way to do this with the original data or by merging it in, but I'll provide a data step solution based on your data.

My conversion to number may be off, I always get put/input wrong.

Untested.

data want;

set have;

by id;

retain first_age firsttrauma;

if first.id then do;

      firsttrauma=input(compress(name, 'kd'), 2.);

     first_age=ageofonset;

end;

if not first.id and ageofonset=first_age then firsttrauma=99;

if last.id then output;

end;

ginak
Quartz | Level 8

Hi Reeza!

Thank you so much. The code worked Smiley Happy I just had to change input to put. You always are so fast to respond and help me, which I really appreciate. I am, unfortunately, not the best programmer, but try to google things before I come here and ask you guys. If you have any advice on how I can learn little tricks like the one you showed me, I'd really appreciate it! Just simple things like using the retain, first.variable types of statements.

Thanks again!!!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 747 views
  • 1 like
  • 2 in conversation