BookmarkSubscribeRSS Feed
newboy1218
Quartz | Level 8

Hi, I have a dataset contain the age group and income info. I want to create a dataset from it, counting the cumulative income level by age group. For example, for each age group, I want to know how many people have income <= 1000, <= 2000, ... etc.

 

I tried using proc sql - group by function (on age group) along with sum - case when on income, but I got all the income level as column instead of rows. Is there an efficient way to do this?

 

HAVE:

IDAge_GroupIncome
121-301000
251-601200
331-401400
451-602020
521-303000
651-603000

 

WANT:

Age_GroupIncomeCumulative_count
21-3010001
21-3020001
21-3030002
31-4010000
31-4020001
31-4030001
51-6010000
51-6020002
51-6030003
3 REPLIES 3
Reeza
Super User

Assuming you know the start/end of the ranges of income you'd like to represent in your data, I would recommend something like this:

 

data income_fmt;
	fmtname='income_fmt';
	type='N';

	do i=1000 to 10000 by 1000;
		start=i;
	end=i+1000 - 1;
	label=i;
	output;
end;
run;

proc format cntlin=income_fmt;
run;

data have;
	infile cards dlm='09'x;
	input ID $ 	Age_Group $	Income;
	cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
6	51-60	5000
;
	;
	;;
run;

proc sort data=have;
	by age_group;
run;

proc freq data=have noprint;
	by age_group;
	table income/ list outcum sparse out=want;
	format income income_fmt.;
run;

proc print data=want;
run;

CAVEAT: If you're missing a range, ie 3000 across all categories it will not print in the results

average_joe
Obsidian | Level 7

There are some errors in your want output. Anyway to get the correct cumulative counts, use a multilabel format with proc summary.

 

data have;
infile cards dlm='09'x;
input ID	Age_Group $	Income;
cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
;
run;


proc format;
    value inclvl (multilabel)
        low - 1000 = '<= $1000'
        low - 2000 = '<= $2000'
        low - 3000 = '<= $3000'
        ;
run;


proc summary data=have nway completetypes;
class age_group;
class income / mlf order=formatted;
format income inclvl.;
var id;
output out=want (drop=_:) n=cum_count;
run;

 

 

 

Ksharp
Super User
data have;
infile cards expandtabs;
input ID	Age_Group $	Income;
cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
;
run;


data level;
do income=1000 to 3000 by 1000;
output;
end;
run;

proc sql;
create table want as
select *,(select count(*) from have where Age_Group=a.Age_Group and income<=a.income) as Cum_count
from (select * from (select distinct Age_Group from have),(select income from level)) as a 
order by 1,2;
quit;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 705 views
  • 4 likes
  • 4 in conversation