BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ginak
Quartz | Level 8

Hello!

So my data set is in long format (I can't transform it to wide format.. too many variables). I have 3 traumas, trauma #1, #3 and #5. People either have one or two of the traumas. Each record/line indicates a trauma. So the data set might look like this:

ID group traumatype tr1age0 tr1age1 tr1age2 [...] tr1age18 tr3age0 tr3age1 tr3age2 [.. until] tr3age18 tr5age1 tr5age2 [...] trage0 trage1 trage2 trage3 [...] trage18 duration

1   6        trauma1         1        1            .                 .          .            .            .                      .            .         .                1          1       .         .               .            2

1   6        trauma3          .         .           .                   .        .            1           1                      .            .         .                 .          1       1        .                            2

2    1       trauma1                    1                                                                                                                                                                                            1

So   here ID #1 is in group 6, which means they have endorsed trauma1 and trauma3. ID # 2 is in group 1, so they've only endorsed trauma1 (so depending on whether they've endorsed only one of the three traumas or two, that will determine if they're in group 1,2,3,4,5 or 6.).

So suppose now we have these two people. I want to calculate the # years endorsed trauma. So for ID#1, they have endorsed a trauma at ages 0,1 and 2. Notice that they endorsed both traumas at age 1. That's ok, it still counts as 1 year. So their years endorsed = 3.

Now, How can I get SAS to do this? I made the new variable, trage0 through trage18 to indicate whether they endorsed their trauma at that age (because I thought it might be easier to have the same variable name).

I was thinking of taking the maximum at each age. so for ID#6, their max at age 0=1, max at age1=1, max at age 2=1, max at age 3=., ..., max at age 18=. (missing). So then I'd sum the max's and get 3 as the number of years having endorsed a trauma.

Any help as to how I can do this? I feel like I could use an array but I'm not sure. Also the fact that the index starts at 0 and not 1 is trouble some.. I think i need to go from -1 to 18 so sas doesn't freak out.

Thanks!!!

Gina

PS- in general, where can one find information on using long format data to calculate things like this on the web? I'm not sure Smiley Sad

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This is a different problem...Really why you should provide sample data and output.

I'd probably recommend collapsing to the last row and then summing.

data want;

set have;

by ID;

array tr_age(19) gta0-gta18;

array tr_keep(19) temp1-temp19;

retain temp1-temp19;

if first.id then do;

     call missing (of tr_keep(*));

end;

do i=1 to 19;

     if tr_age(i)=1 then tr_keep(i)=1;

end;

if last.id then do;

count_trauma=sum(of tr_keep(*));

output;

end;

keep ID count_trauma;

run;

View solution in original post

7 REPLIES 7
Reeza
Super User

IMO Your data is a combination of Wide/Long, because you have multiple records per ID, but also many variables.

What do you want the output to look like?

ginak
Quartz | Level 8

Yes, you're right. It's definitely a combo because of that reason! I guess what I'd like to have is a variable called numyearstrauma or something, where it gives the same answer for both lines, if that makes sense? The other variables repeat per row (I guess it's kind of dumb but oh well.. my boss wants me to keep the data set in this format). Basically I'm trying to calculate this score per person, because eventually I want to calculate the mean numyearstrauma/age score per group (There are 6 groups).

Reeza
Super User

OK...I'm still a bit vague on the requirements here, but a quick shot, that calculates the statistic. You'll need to merge the results in.

I believe the group is irrelevant at this point to your question.

data want;

set have;

by ID;

array tr_age(*) tr1age: tr2age: tr3age:;

retain count_trauma;

if first.id then count_trauma=sum(of tr_age(*));

else count_trauma=count_trauma+sum(of tr_age(*));

if last.id then output;

keep ID count_trauma;

run;

ginak
Quartz | Level 8

Hi Reeza,

Thank you so much! This is soooo close to what I'm looking for. How could you modify it such that the ties aren't double counted? FOr example,

snippet.JPG

So here, for display purposes, you see that ID #000511 has had trauma at 4 years of their life (year 3,4,5, and 8). But the code you gave me will give me 6 instead of 4.. i.e., it double counts the year. How could we modify it such that it doesn't double count years 5 and 8 (in this case)? The years of trauma endorsed for each are separated in the way that I told you above (tr1age, tr2age, tr3age), but I also created a variable that puts them all into one varaible by id number (it ended up not being necessary for your code)

Thanks again Smiley Happy

Reeza
Super User

This is a different problem...Really why you should provide sample data and output.

I'd probably recommend collapsing to the last row and then summing.

data want;

set have;

by ID;

array tr_age(19) gta0-gta18;

array tr_keep(19) temp1-temp19;

retain temp1-temp19;

if first.id then do;

     call missing (of tr_keep(*));

end;

do i=1 to 19;

     if tr_age(i)=1 then tr_keep(i)=1;

end;

if last.id then do;

count_trauma=sum(of tr_keep(*));

output;

end;

keep ID count_trauma;

run;

ballardw
Super User

Since it looks to me that you are basically just counting the number of variables with a value then the N statistic instead of Sum per Reeza's response.

Tom
Super User Tom
Super User

Since you variables seem to be coded as either 1 or missing you can take advantage of the UPDATE statement to collapse the rows.

data want ;

   update have (obs=0) have ;

   by id;

   if last.id then count=sum(of tr1age0 tr1age1 tr1age2 [...] tr1age18);

   keep id count ;

run;

If you need to have this new COUNT variable replicated onto all of the records for that ID then merge it back by ID.

data want ;

  merge want have ;

  by id;

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1935 views
  • 6 likes
  • 4 in conversation