summing within columns

K_S · Posted 10-06-2017 02:47 PM

Hi all,

I am hoping to get your help with the following:

I have a database that looks like this

year	sex	are a	status	incidence	prevalence
2000	F	ON	0	5	200
2001	F	ON	0	6	300
2002	F	ON	0	5	350
2003	F	ON	0	5	201
2004	F	ON	0	4	222
2005	F	ON	0	2	333

and I want to 'mush' together rows to get this :

year	sex	are a	status	incidence	prevalence
2000/2002	F	ON	0	16	850
2003/2005	F	ON	0	11	756

Note that I am simply adding the years. I have several thousands of these so doing it by hand in excel is not very practical.

Any idea of how to do this in SAS?

Thank you in advance!

PaigeMiller · Posted 10-06-2017 02:51 PM

@K_S wrote:

Hi all,

I am hoping to get your help with the following:

I have a database that looks like this

year sex are a status incidence prevalence

2000 F ON 0 5 200

2001 F ON 0 6 300

2002 F ON 0 5 350

2003 F ON 0 5 201

2004 F ON 0 4 222

2005 F ON 0 2 333

and I want to 'mush' together rows to get this :

year sex are a status incidence prevalence

2000/2002 F ON 0 16 850

2003/2005 F ON 0 11 756

Note that I am simply adding the years. I have several thousands of these so doing it by hand in excel is not very practical.

Any idea of how to do this in SAS?

Thank you in advance!

"Adding the years" ... do you mean adding the incidence and prevalence within years?

You can create a format that groups together the different years, and then run PROC SUMMARY on the formatted years to get the sums.

For example:

proc format;
    value yearf 2000-2002='2000/2002' 2003-2005='2003/2005';
run;
proc summary nway data=have;
    class year;
    format year yearf.;
    var incidence prevalence;
    output out=want sum=;
run;

--
Paige Miller

novinosrin · Posted 10-06-2017 03:24 PM

data have;

input year sex $ are_a $ status $ incidence prevalence;

datalines;

2000 F ON 0 5 200

2001 F ON 0 6 300

2002 F ON 0 5 350

2003 F ON 0 5 201

2004 F ON 0 4 222

2005 F ON 0 2 333

;

proc sql;

create table want(drop=k) as

select distinct catx('/',min(year),max(year)) as year, sex , are_a ,status ,round(year/3) as k, sum(incidence) as sum_incidence, sum(prevalence) as sum_prevalence

from have

group by k;

quit;

SAS_inquisitive · Posted 10-06-2017 03:47 PM

data have;
	input	year 	sex $	area $	status 	incidence 	prevalence;
	cards;
2000	F	ON	0	5	200
2001	F	ON	0	6	300
2002	F	ON	0	5	350
2003	F	ON	0	5	201
2004	F	ON	0	4	222
2005	F	ON	0	2	333
;

proc sql noprint;
	create table want1 as
		select catx('/',min(year),max(year)) as year_,
			sex,
			area,
			status,
			sum(incidence) as sum_incidence1,
			sum(prevalence) as sum_prevalence1

		from have 
			where year in (2000:2002);

	create table want2 as
		select catx('/',min(year),max(year)) as year_,
			sex,
			area,
			status,
			sum(incidence) as sum_incidence2,
			sum(prevalence) as sum_prevalence2

		from have 
			where year in (2003:2005);
quit;

data tmp;
	set want1 (rename = (sum_incidence1 = incidence sum_prevalence1 = prevalence year_ = year)) 
		want2 (rename = (sum_incidence2 = incidence sum_prevalence2 = prevalence year_ = year));
run;

data want;
	set tmp;
	by year notsorted;

	if last.year;
run;

novinosrin · Posted 10-06-2017 04:02 PM

data have;

input year sex $ are_a $ status $ incidence prevalence;

datalines;

2000 F ON 0 5 200

2001 F ON 0 6 300

2002 F ON 0 5 350

2003 F ON 0 5 201

2004 F ON 0 4 222

2005 F ON 0 2 333

;

data want;

retain _year;

if 0 then set have;

sum_incidence=0;

sum_prevalence=0;

do _n_=1 by 1 until(_n_=3);

set have;

by year;

sum_incidence+incidence;

sum_prevalence+prevalence;

if _n_=1 then temp=year;

else if _n_=3 then do;

_year=catx('/',temp,year);

output;

end;

drop year temp incidence prevalence;

run;

PaigeMiller · Posted 10-07-2017 08:07 AM

@novinosrin wrote:

data have;

input year     sex $ are_a $    status $ incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

data want;

retain _year;

if 0 then set have;

sum_incidence=0;

sum_prevalence=0;

do _n_=1 by 1 until(_n_=3);

set have;

by year;

sum_incidence+incidence;

sum_prevalence+prevalence;

if _n_=1 then temp=year;

else if _n_=3 then do;

_year=catx('/',temp,year);

output;

end;

end;

drop year temp incidence prevalence;

run;

Assuming the year combinations wanted are always three-year combinations, this works fine. It also assumes that the final year is some multiple of 3 years after the initial year.

I find it opposite to my way of thinking to perform these types of summations in a data step, you have to create your own looping and end of loop conditions; when PROC MEANS and PROC SUMMARY were built to do this.

I don't know what the speed implications are for large data sets (which of course this example is not), however I would imagine that using a PROC would be faster for large data sets, but I am not aware of any study that shows this.

--
Paige Miller

novinosrin · Posted 10-07-2017 01:02 PM

@PaigeMiller You are absolutely right. I would like the OP to clarify the grouping sets of years that he/she wants in the required output. I was merely having fun with Sql and datastep approaches as it didn't take more than a couple of mins. lol

K_S · Posted 10-10-2017 11:37 AM

I am not sure I understand what I have to clarify, so please forgive me if I am off topic.

I have about 10 years worth of data for 10 different geographic areas. I am trying to get disease prevalence estimates but inctea of yearly estimates, i want to get estimates based on 3 year avearegs.

I hope this clarifies. Also I need to keep the in-between columns, so they cannot be eliminated.

Thank you!

summing within columns

Re: summing within columns

Re: summing within columns

Re: summing within columns

Re: summing within columns

Re: summing within columns

Re: summing within columns

Re: summing within columns

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away