Count cumulative obs by category

newboy1218 · Posted 04-26-2022 05:14 PM

Hi, I have a dataset contain the age group and income info. I want to create a dataset from it, counting the cumulative income level by age group. For example, for each age group, I want to know how many people have income <= 1000, <= 2000, ... etc.

I tried using proc sql - group by function (on age group) along with sum - case when on income, but I got all the income level as column instead of rows. Is there an efficient way to do this?

HAVE:

ID	Age_Group	Income
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000

WANT:

Age_Group	Income	Cumulative_count
21-30	1000	1
21-30	2000	1
21-30	3000	2
31-40	1000	0
31-40	2000	1
31-40	3000	1
51-60	1000	0
51-60	2000	2
51-60	3000	3

Reeza · Posted 04-26-2022 06:12 PM

Assuming you know the start/end of the ranges of income you'd like to represent in your data, I would recommend something like this:

data income_fmt;
	fmtname='income_fmt';
	type='N';

	do i=1000 to 10000 by 1000;
		start=i;
	end=i+1000 - 1;
	label=i;
	output;
end;
run;

proc format cntlin=income_fmt;
run;

data have;
	infile cards dlm='09'x;
	input ID $ 	Age_Group $	Income;
	cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
6	51-60	5000
;
	;
	;;
run;

proc sort data=have;
	by age_group;
run;

proc freq data=have noprint;
	by age_group;
	table income/ list outcum sparse out=want;
	format income income_fmt.;
run;

proc print data=want;
run;

CAVEAT: If you're missing a range, ie 3000 across all categories it will not print in the results

average_joe · Posted 04-26-2022 06:17 PM

There are some errors in your want output. Anyway to get the correct cumulative counts, use a multilabel format with proc summary.

data have;
infile cards dlm='09'x;
input ID	Age_Group $	Income;
cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
;
run;


proc format;
    value inclvl (multilabel)
        low - 1000 = '<= $1000'
        low - 2000 = '<= $2000'
        low - 3000 = '<= $3000'
        ;
run;


proc summary data=have nway completetypes;
class age_group;
class income / mlf order=formatted;
format income inclvl.;
var id;
output out=want (drop=_:) n=cum_count;
run;

Ksharp · Posted 04-27-2022 08:36 AM

data have;
infile cards expandtabs;
input ID	Age_Group $	Income;
cards;
1	21-30	1000
2	51-60	1200
3	31-40	1400
4	51-60	2020
5	21-30	3000
6	51-60	3000
;
run;


data level;
do income=1000 to 3000 by 1000;
output;
end;
run;

proc sql;
create table want as
select *,(select count(*) from have where Age_Group=a.Age_Group and income<=a.income) as Cum_count
from (select * from (select distinct Age_Group from have),(select income from level)) as a 
order by 1,2;
quit;

Count cumulative obs by category

Re: Count cumulative obs by category

Re: Count cumulative obs by category

Re: Count cumulative obs by category

Count cumulative obs by category

Re: Count cumulative obs by category

Re: Count cumulative obs by category

Re: Count cumulative obs by category

Registration is open

SAS Training: Just a Click Away