DATA Step, Macro, Functions and more

Question about summing something in long format

Accepted Solution Solved
Reply
Contributor
Posts: 67
Accepted Solution

Question about summing something in long format

Hello!

So my data set is in long format (I can't transform it to wide format.. too many variables). I have 3 traumas, trauma #1, #3 and #5. People either have one or two of the traumas. Each record/line indicates a trauma. So the data set might look like this:

ID group traumatype tr1age0 tr1age1 tr1age2 [...] tr1age18 tr3age0 tr3age1 tr3age2 [.. until] tr3age18 tr5age1 tr5age2 [...] trage0 trage1 trage2 trage3 [...] trage18 duration

1   6        trauma1         1        1            .                 .          .            .            .                      .            .         .                1          1       .         .               .            2

1   6        trauma3          .         .           .                   .        .            1           1                      .            .         .                 .          1       1        .                            2

2    1       trauma1                    1                                                                                                                                                                                            1

So   here ID #1 is in group 6, which means they have endorsed trauma1 and trauma3. ID # 2 is in group 1, so they've only endorsed trauma1 (so depending on whether they've endorsed only one of the three traumas or two, that will determine if they're in group 1,2,3,4,5 or 6.).

So suppose now we have these two people. I want to calculate the # years endorsed trauma. So for ID#1, they have endorsed a trauma at ages 0,1 and 2. Notice that they endorsed both traumas at age 1. That's ok, it still counts as 1 year. So their years endorsed = 3.

Now, How can I get SAS to do this? I made the new variable, trage0 through trage18 to indicate whether they endorsed their trauma at that age (because I thought it might be easier to have the same variable name).

I was thinking of taking the maximum at each age. so for ID#6, their max at age 0=1, max at age1=1, max at age 2=1, max at age 3=., ..., max at age 18=. (missing). So then I'd sum the max's and get 3 as the number of years having endorsed a trauma.

Any help as to how I can do this? I feel like I could use an array but I'm not sure. Also the fact that the index starts at 0 and not 1 is trouble some.. I think i need to go from -1 to 18 so sas doesn't freak out.

Thanks!!!

Gina

PS- in general, where can one find information on using long format data to calculate things like this on the web? I'm not sure Smiley Sad


Accepted Solutions
Solution
‎02-19-2014 06:36 PM
Super User
Posts: 17,963

Re: Question about summing something in long format

This is a different problem...Really why you should provide sample data and output.

I'd probably recommend collapsing to the last row and then summing.

data want;

set have;

by ID;

array tr_age(19) gta0-gta18;

array tr_keep(19) temp1-temp19;

retain temp1-temp19;

if first.id then do;

     call missing (of tr_keep(*));

end;

do i=1 to 19;

     if tr_age(i)=1 then tr_keep(i)=1;

end;

if last.id then do;

count_trauma=sum(of tr_keep(*));

output;

end;

keep ID count_trauma;

run;

View solution in original post


All Replies
Super User
Posts: 17,963

Re: Question about summing something in long format

IMO Your data is a combination of Wide/Long, because you have multiple records per ID, but also many variables.

What do you want the output to look like?

Contributor
Posts: 67

Re: Question about summing something in long format

Yes, you're right. It's definitely a combo because of that reason! I guess what I'd like to have is a variable called numyearstrauma or something, where it gives the same answer for both lines, if that makes sense? The other variables repeat per row (I guess it's kind of dumb but oh well.. my boss wants me to keep the data set in this format). Basically I'm trying to calculate this score per person, because eventually I want to calculate the mean numyearstrauma/age score per group (There are 6 groups).

Super User
Posts: 17,963

Re: Question about summing something in long format

OK...I'm still a bit vague on the requirements here, but a quick shot, that calculates the statistic. You'll need to merge the results in.

I believe the group is irrelevant at this point to your question.

data want;

set have;

by ID;

array tr_age(*) tr1age: tr2age: tr3age:;

retain count_trauma;

if first.id then count_trauma=sum(of tr_age(*));

else count_trauma=count_trauma+sum(of tr_age(*));

if last.id then output;

keep ID count_trauma;

run;

Contributor
Posts: 67

Re: Question about summing something in long format

Hi Reeza,

Thank you so much! This is soooo close to what I'm looking for. How could you modify it such that the ties aren't double counted? FOr example,

snippet.JPG

So here, for display purposes, you see that ID #000511 has had trauma at 4 years of their life (year 3,4,5, and 8). But the code you gave me will give me 6 instead of 4.. i.e., it double counts the year. How could we modify it such that it doesn't double count years 5 and 8 (in this case)? The years of trauma endorsed for each are separated in the way that I told you above (tr1age, tr2age, tr3age), but I also created a variable that puts them all into one varaible by id number (it ended up not being necessary for your code)

Thanks again Smiley Happy

Solution
‎02-19-2014 06:36 PM
Super User
Posts: 17,963

Re: Question about summing something in long format

This is a different problem...Really why you should provide sample data and output.

I'd probably recommend collapsing to the last row and then summing.

data want;

set have;

by ID;

array tr_age(19) gta0-gta18;

array tr_keep(19) temp1-temp19;

retain temp1-temp19;

if first.id then do;

     call missing (of tr_keep(*));

end;

do i=1 to 19;

     if tr_age(i)=1 then tr_keep(i)=1;

end;

if last.id then do;

count_trauma=sum(of tr_keep(*));

output;

end;

keep ID count_trauma;

run;

Super User
Posts: 10,550

Re: Question about summing something in long format

Since it looks to me that you are basically just counting the number of variables with a value then the N statistic instead of Sum per Reeza's response.

Super User
Super User
Posts: 6,502

Re: Question about summing something in long format

Since you variables seem to be coded as either 1 or missing you can take advantage of the UPDATE statement to collapse the rows.

data want ;

   update have (obs=0) have ;

   by id;

   if last.id then count=sum(of tr1age0 tr1age1 tr1age2 [...] tr1age18);

   keep id count ;

run;

If you need to have this new COUNT variable replicated onto all of the records for that ID then merge it back by ID.

data want ;

  merge want have ;

  by id;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 406 views
  • 6 likes
  • 4 in conversation