BookmarkSubscribeRSS Feed
newboy1218
Quartz | Level 8

Hi, I have a dataset contain the age group and income info. I want to create a dataset from it, counting the cumulative income level by age group. For example, for each age group, I want to know how many people have income <= 1000, <= 2000, ... etc.

 

I tried using proc sql - group by function (on age group) along with sum - case when on income, but I got all the income level as column instead of rows. Is there an efficient way to do this?

 

HAVE:

IDAge_GroupIncome
121-301000
251-601200
331-401400
451-602020
521-303000
651-603000

 

WANT:

Age_GroupIncomeCumulative_count
21-3010001
21-3020001
21-3030002
31-4010000
31-4020001
31-4030001
51-6010000
51-6020002
51-6030003
3 REPLIES 3
Reeza
Super User

Assuming you know the start/end of the ranges of income you'd like to represent in your data, I would recommend something like this:

 

data income_fmt;
	fmtname='income_fmt';
	type='N';

	do i=1000 to 10000 by 1000;
		start=i;
	end=i+1000 - 1;
	label=i;
	output;
end;
run;

proc format cntlin=income_fmt;
run;

data have;
	infile cards dlm='09'x;
	input ID $ 	Age_Group $	Income;
	cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
6	51-60	5000
;
	;
	;;
run;

proc sort data=have;
	by age_group;
run;

proc freq data=have noprint;
	by age_group;
	table income/ list outcum sparse out=want;
	format income income_fmt.;
run;

proc print data=want;
run;

CAVEAT: If you're missing a range, ie 3000 across all categories it will not print in the results

average_joe
Obsidian | Level 7

There are some errors in your want output. Anyway to get the correct cumulative counts, use a multilabel format with proc summary.

 

data have;
infile cards dlm='09'x;
input ID	Age_Group $	Income;
cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
;
run;


proc format;
    value inclvl (multilabel)
        low - 1000 = '<= $1000'
        low - 2000 = '<= $2000'
        low - 3000 = '<= $3000'
        ;
run;


proc summary data=have nway completetypes;
class age_group;
class income / mlf order=formatted;
format income inclvl.;
var id;
output out=want (drop=_:) n=cum_count;
run;

 

 

 

Ksharp
Super User
data have;
infile cards expandtabs;
input ID	Age_Group $	Income;
cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
;
run;


data level;
do income=1000 to 3000 by 1000;
output;
end;
run;

proc sql;
create table want as
select *,(select count(*) from have where Age_Group=a.Age_Group and income<=a.income) as Cum_count
from (select * from (select distinct Age_Group from have),(select income from level)) as a 
order by 1,2;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 584 views
  • 4 likes
  • 4 in conversation