Solved: Proc tabulate challenge?

dunga · Posted 08-21-2017 09:17 AM

/*Hi SAS Forum,

I have two datasets. One set provides the income of k number of houses in Jun.

Other set provides the income of p number of houses in May.

My question is further down.

*/

/*Jun**/

data Have_Jun;

input Income;

cards;

0

100

105

100

200

300

400

500

1000

;

run;

/*May*/

data Have_May;

input Income;

cards;

0

.

105

100

199

301

302

400

500

1000

125

305

1200

;

run;

/*Q: I want to generate a proc tabulate table like below (in fact, I have to produce 75 proc tabulate tables for huge datasets).

	Income	Missing	0	>0-100	>100-200	>200-300	>300-400	>400-500	>500-1000	>1000	sum
Jun	COUNT		1	2	2	2	1	1	1		10
	PERCENT	0.0%	10.0%	20.0%	20.0%	20.0%	10.0%	10.0%	10.0%
May	COUNT	1	1	1	3		4	1	1	1	13
	PERCENT	7.69%	7.69%	7.69%	23.08%	0.00%	30.77%	7.69%	7.69%	7.69%
Diff		-7.69%	2.31%	12.31%	-3.08%	20.00%	-20.77%	2.31%	2.31%	-7.69%

This table simply presents counts of subjects in each income category for June and May separately.

And it presnets total sums separately.

And it presents corresponding perecentages separately.

And it presents difference in percentages between Jun and May in each income category.

/*I have done up to this but I cannot handle the proc trabulate part. Would appreciate any help?*/

/*created a format*/

proc format ;

value val

.= 'Missing'

low-<0 = '<0'

0 = '0'

0<-100 = '>0-100'

100<-200 = '>100-200'

200<-300 = '>200-300'

300<-400 = '>300-400'

400<-500 = '>400-500'

500<-1000 = '>500-1000'

1000<-high = '>1000';

run;

/*applied the format for Jun*/

data need_jun;

set have_jun;

income_category = put (income, val.);

Flag='Jun';

run;

/*applied the format for May*/

data need_may;

set have_may;

income_category = put (income, val.);

Flag = 'May';

run;

/*concatenate 2 tables together*/

data want;

set need_may need_jun;

run;

/*Apply a proc tabulate*/

PROC TABULATE DATA=want;

VAR income;

class income_category flag;

TABLE flag, income_category, income*N ;

RUN;

Ksharp · Posted 08-21-2017 10:11 AM

If I was right. PROC TABULATE can't do this unstructure summary table.

/*Jun**/
data Have_Jun;
input Income;
cards;
0
100
105
100
200
300
300
400
500
1000
;
run;
 
 
/*May*/
data Have_May;
input Income;
cards;
0
.
105
100
199
301
302
400
500
1000
125
305
1200
;
run;
proc format ;
value val
.= 'Missing'
low-<0 = '<0'
0 = '0'
0<-100 = '>0-100'
100<-200 = '>100-200'
200<-300 = '>200-300'
300<-400 = '>300-400'
400<-500 = '>400-500'
500<-1000 = '>500-1000'
1000<-high = '>1000';

value fmt
 -1<-<1=[percentn7.2];
run;
proc freq data=Have_Jun noprint;
table income/out=jun missing;
format income val.;
run;
proc freq data=Have_May noprint;
table income/out=may missing;
format income val.;
run;

data jun;
 set jun;
 retain month 'Jun    ';
 c_income=put(income,val.);
 percent=percent*0.01;
 format percent percent7.2;
 drop income;
run;
data may;
 set may;
 retain month 'May    ';
 c_income=put(income,val.);
 percent=percent*0.01;
 format percent percent7.2;
 drop income;
run;


options validvarname=any missing=0;
proc transpose data=jun out=t_jun;
by month;
var count percent;
id c_income;
idlabel c_income;
run;
proc transpose data=may out=t_may;
by month;
var count percent;
id c_income;
idlabel c_income;
run;

data dif;
 set  t_may(where=(_NAME_='PERCENT')) t_jun(where=(_NAME_='PERCENT')) end=last;
 retain month 'Dif     ';
 call missing(_name_);
 array x{*} _numeric_;
 do i=1 to dim(x);
  x{i}=sum(x{i},-lag(x{i}));
 end;
 if last;
 drop i;
run;

data want;
retain month
'Missing'n
'<0'n
'0'n
'>0-100'n
'>100-200'n
'>200-300'n
'>300-400'n
'>400-500'n
'>500-1000'n
'>1000'n;
 set t_jun t_may dif;
 format _numeric_ fmt.;
 drop _label_;
run;

View solution in original post

Ksharp · Posted 08-21-2017 10:11 AM

If I was right. PROC TABULATE can't do this unstructure summary table.

/*Jun**/
data Have_Jun;
input Income;
cards;
0
100
105
100
200
300
300
400
500
1000
;
run;
 
 
/*May*/
data Have_May;
input Income;
cards;
0
.
105
100
199
301
302
400
500
1000
125
305
1200
;
run;
proc format ;
value val
.= 'Missing'
low-<0 = '<0'
0 = '0'
0<-100 = '>0-100'
100<-200 = '>100-200'
200<-300 = '>200-300'
300<-400 = '>300-400'
400<-500 = '>400-500'
500<-1000 = '>500-1000'
1000<-high = '>1000';

value fmt
 -1<-<1=[percentn7.2];
run;
proc freq data=Have_Jun noprint;
table income/out=jun missing;
format income val.;
run;
proc freq data=Have_May noprint;
table income/out=may missing;
format income val.;
run;

data jun;
 set jun;
 retain month 'Jun    ';
 c_income=put(income,val.);
 percent=percent*0.01;
 format percent percent7.2;
 drop income;
run;
data may;
 set may;
 retain month 'May    ';
 c_income=put(income,val.);
 percent=percent*0.01;
 format percent percent7.2;
 drop income;
run;


options validvarname=any missing=0;
proc transpose data=jun out=t_jun;
by month;
var count percent;
id c_income;
idlabel c_income;
run;
proc transpose data=may out=t_may;
by month;
var count percent;
id c_income;
idlabel c_income;
run;

data dif;
 set  t_may(where=(_NAME_='PERCENT')) t_jun(where=(_NAME_='PERCENT')) end=last;
 retain month 'Dif     ';
 call missing(_name_);
 array x{*} _numeric_;
 do i=1 to dim(x);
  x{i}=sum(x{i},-lag(x{i}));
 end;
 if last;
 drop i;
run;

data want;
retain month
'Missing'n
'<0'n
'0'n
'>0-100'n
'>100-200'n
'>200-300'n
'>300-400'n
'>400-500'n
'>500-1000'n
'>1000'n;
 set t_jun t_may dif;
 format _numeric_ fmt.;
 drop _label_;
run;

ballardw · Posted 08-21-2017 10:32 AM

Tabulate will not calculate the difference as shown. Tabulate does no "comparison" of any statistic

You would also want to clarify what your multiple tables may need to look like. If your time frame goes past two months what type of difference is to be shown? Every two months? First and last across the entire period?

You should also consider order of appearance of the months. You will run into issues with character values for months sorting and appearing in the correct order: Apr is before May and March in character order. And will your data ever cross calendar years? You may be better off with an actual date value to get those to work and align things.

I see this as requiring some summarization and possibly transposition to get all of the elements.

dunga · Posted 08-21-2017 01:50 PM

Hi Ksharp and ballardw,

Thanks very much.

Ksharp's code is generating the results I expected but in terms of a SAS data set like below.

month	Missing	<0	0	>0-100	>100-200	>200-300	>300-400	>400-500	>500-1000	>1000	_NAME_
Jun	0	0	1	2	2	2	1	1	1	0	COUNT
Jun	0	0	10.00%	20.00%	20.00%	20.00%	10.00%	10.00%	10.00%	0	PERCENT
May	1	0	1	1	3		4	1	1	1	COUNT
May	7.69%	0	7.69%	7.69%	23.10%		30.80%	7.69%	7.69%	7.69%	PERCENT
Jun	-7.69%	0	2.31%	12.30%	-3.08%	20.00%	-20.80%	2.31%	2.31%	-7.69%

To present these resutls I have to export to Excel which is difficult given I have to produce and export so many two months comparison tables.

Would there be any way that I could display the same above SAS dataset resutls as a display (just like proc tabulate generate display tables) without having to export to Excel?

Thanks for your expertise.

Ksharp · Posted 08-22-2017 08:20 AM

one more thing ,change the order of variables to look the same as your report.

retain month
_NAME_
'Missing'n
'<0'n
'0'n
'>0-100'n
'>100-200'n
'>200-300'n
'>300-400'n
'>400-500'n
'>500-1000'n
'>1000'n;

After that PROC EXPORT can export sas table into Excel file.


About your question, make a macro and CALL EXECUTE() it .
Like:


data x;
input mon $;
cards;
jan
feb
mar
apr
....
;


data _null_;
  set x;
lag_mon=lag(mon);
if _n_ ne 1 then call execute('your-macro-here');
run;

Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

Re: Proc tabulate challenge?

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!