DATA Step, Macro, Functions and more

summing within columns

Reply
Contributor K_S
Contributor
Posts: 25

summing within columns

Hi all, 

I am hoping to get your help with the following:

I have a database that looks like this 

year sexare astatus incidence prevalence 
2000FON05200
2001FON06300
2002FON05350
2003FON05201
2004FON04222
2005FON02333
      


and I want to 'mush' together rows to get this :

 

year sexare astatus incidence prevalence 
2000/2002FON016850
2003/2005FON011756

 

 

Note that I am simply adding the years. I have several thousands of these so doing it by hand in excel is not very practical. 

Any idea of how to do this in SAS? 

Thank you in advance!

Trusted Advisor
Posts: 1,789

Re: summing within columns

[ Edited ]

K_S wrote:

Hi all, 

I am hoping to get your help with the following:

I have a database that looks like this 

year  sex are a status  incidence  prevalence 
2000 F ON 0 5 200
2001 F ON 0 6 300
2002 F ON 0 5 350
2003 F ON 0 5 201
2004 F ON 0 4 222
2005 F ON 0 2 333
           


and I want to 'mush' together rows to get this :

 

year  sex are a status  incidence  prevalence 
2000/2002 F ON 0 16 850
2003/2005 F ON 0 11 756

 

 

Note that I am simply adding the years. I have several thousands of these so doing it by hand in excel is not very practical. 

Any idea of how to do this in SAS? 

Thank you in advance!


"Adding the years" ... do you mean adding the incidence and prevalence within years?

 

You can create a format that groups together the different years, and then run PROC SUMMARY on the formatted years to get the sums.

 

For example:

 

proc format;
    value yearf 2000-2002='2000/2002' 2003-2005='2003/2005';
run;
proc summary nway data=have;
    class year;
    format year yearf.;
    var incidence prevalence;
    output out=want sum=;
run;
PROC Star
Posts: 276

Re: summing within columns

data have;

input year      sex $ are_a $    status $   incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

 

proc sql;

create table want(drop=k) as

select distinct catx('/',min(year),max(year)) as year, sex ,     are_a ,status ,round(year/3) as k, sum(incidence) as sum_incidence, sum(prevalence) as sum_prevalence

from  have

group by k;

quit;

Super Contributor
Posts: 266

Re: summing within columns

 

data have;
	input	year 	sex $	area $	status 	incidence 	prevalence;
	cards;
2000	F	ON	0	5	200
2001	F	ON	0	6	300
2002	F	ON	0	5	350
2003	F	ON	0	5	201
2004	F	ON	0	4	222
2005	F	ON	0	2	333
;

proc sql noprint;
	create table want1 as
		select catx('/',min(year),max(year)) as year_,
			sex,
			area,
			status,
			sum(incidence) as sum_incidence1,
			sum(prevalence) as sum_prevalence1

		from have 
			where year in (2000:2002);

	create table want2 as
		select catx('/',min(year),max(year)) as year_,
			sex,
			area,
			status,
			sum(incidence) as sum_incidence2,
			sum(prevalence) as sum_prevalence2

		from have 
			where year in (2003:2005);
quit;

data tmp;
	set want1 (rename = (sum_incidence1 = incidence sum_prevalence1 = prevalence year_ = year)) 
		want2 (rename = (sum_incidence2 = incidence sum_prevalence2 = prevalence year_ = year));
run;

data want;
	set tmp;
	by year notsorted;

	if last.year;
run;
PROC Star
Posts: 276

Re: summing within columns

[ Edited ]

data have;

input year      sex $ are_a $    status $   incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

 

 

 

data want;

retain _year;

if 0 then set have;

sum_incidence=0;

sum_prevalence=0;

do _n_=1 by 1 until(_n_=3);

set have;

by year;

sum_incidence+incidence;

sum_prevalence+prevalence;

if _n_=1 then temp=year;

else if _n_=3 then do;

_year=catx('/',temp,year);

output;

end;

end;

drop year temp incidence prevalence;

run;

 

Trusted Advisor
Posts: 1,789

Re: summing within columns

[ Edited ]

novinosrin wrote:

data have;

input year      sex $ are_a $    status $   incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

 

 

 

data want;

retain _year;

if 0 then set have;

sum_incidence=0;

sum_prevalence=0;

do _n_=1 by 1 until(_n_=3);

set have;

by year;

sum_incidence+incidence;

sum_prevalence+prevalence;

if _n_=1 then temp=year;

else if _n_=3 then do;

_year=catx('/',temp,year);

output;

end;

end;

drop year temp incidence prevalence;

run;

 


Assuming the year combinations wanted are always three-year combinations, this works fine. It also assumes that the final year is some multiple of 3 years after the initial year.

 

I find it opposite to my way of thinking to perform these types of summations in a data step, you have to create your own looping and end of loop conditions; when PROC MEANS and PROC SUMMARY were built to do this.

 

I don't know what the speed implications are for large data sets (which of course this example is not), however I would imagine that using a PROC would be faster for large data sets, but I am not aware of any study that shows this.

Highlighted
PROC Star
Posts: 276

Re: summing within columns

@PaigeMiller  You are absolutely right. I would like the OP to clarify the grouping sets of years that he/she wants in the required output. I was merely having fun with Sql and datastep approaches as it didn't take more than a couple of mins. lol 

Contributor K_S
Contributor
Posts: 25

Re: summing within columns

I am not sure I understand what I have to clarify, so please forgive me if I am off topic.

I have about 10 years worth of data for 10 different geographic areas. I am trying to get disease prevalence estimates but inctea of yearly estimates, i want to get estimates based on 3 year avearegs.  

I hope this clarifies. Also I need to keep the in-between columns, so they cannot be eliminated.

Thank you!

Ask a Question
Discussion stats
  • 7 replies
  • 129 views
  • 3 likes
  • 4 in conversation