BookmarkSubscribeRSS Feed
K_S
Obsidian | Level 7 K_S
Obsidian | Level 7

Hi all, 

I am hoping to get your help with the following:

I have a database that looks like this 

year sexare astatus incidence prevalence 
2000FON05200
2001FON06300
2002FON05350
2003FON05201
2004FON04222
2005FON02333
      


and I want to 'mush' together rows to get this :

 

year sexare astatus incidence prevalence 
2000/2002FON016850
2003/2005FON011756

 

 

Note that I am simply adding the years. I have several thousands of these so doing it by hand in excel is not very practical. 

Any idea of how to do this in SAS? 

Thank you in advance!

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@K_S wrote:

Hi all, 

I am hoping to get your help with the following:

I have a database that looks like this 

year  sex are a status  incidence  prevalence 
2000 F ON 0 5 200
2001 F ON 0 6 300
2002 F ON 0 5 350
2003 F ON 0 5 201
2004 F ON 0 4 222
2005 F ON 0 2 333
           


and I want to 'mush' together rows to get this :

 

year  sex are a status  incidence  prevalence 
2000/2002 F ON 0 16 850
2003/2005 F ON 0 11 756

 

 

Note that I am simply adding the years. I have several thousands of these so doing it by hand in excel is not very practical. 

Any idea of how to do this in SAS? 

Thank you in advance!


"Adding the years" ... do you mean adding the incidence and prevalence within years?

 

You can create a format that groups together the different years, and then run PROC SUMMARY on the formatted years to get the sums.

 

For example:

 

proc format;
    value yearf 2000-2002='2000/2002' 2003-2005='2003/2005';
run;
proc summary nway data=have;
    class year;
    format year yearf.;
    var incidence prevalence;
    output out=want sum=;
run;
--
Paige Miller
novinosrin
Tourmaline | Level 20

data have;

input year      sex $ are_a $    status $   incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

 

proc sql;

create table want(drop=k) as

select distinct catx('/',min(year),max(year)) as year, sex ,     are_a ,status ,round(year/3) as k, sum(incidence) as sum_incidence, sum(prevalence) as sum_prevalence

from  have

group by k;

quit;

SAS_inquisitive
Lapis Lazuli | Level 10

 

data have;
	input	year 	sex $	area $	status 	incidence 	prevalence;
	cards;
2000	F	ON	0	5	200
2001	F	ON	0	6	300
2002	F	ON	0	5	350
2003	F	ON	0	5	201
2004	F	ON	0	4	222
2005	F	ON	0	2	333
;

proc sql noprint;
	create table want1 as
		select catx('/',min(year),max(year)) as year_,
			sex,
			area,
			status,
			sum(incidence) as sum_incidence1,
			sum(prevalence) as sum_prevalence1

		from have 
			where year in (2000:2002);

	create table want2 as
		select catx('/',min(year),max(year)) as year_,
			sex,
			area,
			status,
			sum(incidence) as sum_incidence2,
			sum(prevalence) as sum_prevalence2

		from have 
			where year in (2003:2005);
quit;

data tmp;
	set want1 (rename = (sum_incidence1 = incidence sum_prevalence1 = prevalence year_ = year)) 
		want2 (rename = (sum_incidence2 = incidence sum_prevalence2 = prevalence year_ = year));
run;

data want;
	set tmp;
	by year notsorted;

	if last.year;
run;
novinosrin
Tourmaline | Level 20

data have;

input year      sex $ are_a $    status $   incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

 

 

 

data want;

retain _year;

if 0 then set have;

sum_incidence=0;

sum_prevalence=0;

do _n_=1 by 1 until(_n_=3);

set have;

by year;

sum_incidence+incidence;

sum_prevalence+prevalence;

if _n_=1 then temp=year;

else if _n_=3 then do;

_year=catx('/',temp,year);

output;

end;

end;

drop year temp incidence prevalence;

run;

 

PaigeMiller
Diamond | Level 26

@novinosrin wrote:

data have;

input year      sex $ are_a $    status $   incidence prevalence;

datalines;

2000 F    ON   0    5    200

2001 F    ON   0    6    300

2002 F    ON   0    5    350

2003 F    ON   0    5    201

2004 F    ON   0    4    222

2005 F    ON   0    2    333

;

 

 

 

data want;

retain _year;

if 0 then set have;

sum_incidence=0;

sum_prevalence=0;

do _n_=1 by 1 until(_n_=3);

set have;

by year;

sum_incidence+incidence;

sum_prevalence+prevalence;

if _n_=1 then temp=year;

else if _n_=3 then do;

_year=catx('/',temp,year);

output;

end;

end;

drop year temp incidence prevalence;

run;

 


Assuming the year combinations wanted are always three-year combinations, this works fine. It also assumes that the final year is some multiple of 3 years after the initial year.

 

I find it opposite to my way of thinking to perform these types of summations in a data step, you have to create your own looping and end of loop conditions; when PROC MEANS and PROC SUMMARY were built to do this.

 

I don't know what the speed implications are for large data sets (which of course this example is not), however I would imagine that using a PROC would be faster for large data sets, but I am not aware of any study that shows this.

--
Paige Miller
novinosrin
Tourmaline | Level 20

@PaigeMiller  You are absolutely right. I would like the OP to clarify the grouping sets of years that he/she wants in the required output. I was merely having fun with Sql and datastep approaches as it didn't take more than a couple of mins. lol 

K_S
Obsidian | Level 7 K_S
Obsidian | Level 7

I am not sure I understand what I have to clarify, so please forgive me if I am off topic.

I have about 10 years worth of data for 10 different geographic areas. I am trying to get disease prevalence estimates but inctea of yearly estimates, i want to get estimates based on 3 year avearegs.  

I hope this clarifies. Also I need to keep the in-between columns, so they cannot be eliminated.

Thank you!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1475 views
  • 3 likes
  • 4 in conversation