/*Hi SAS Forum,
I have two datasets. One set provides the income of k number of houses in Jun.
Other set provides the income of p number of houses in May.
My question is further down.
*/
/*Jun**/
data Have_Jun;
input Income;
cards;
0
100
105
100
200
300
300
400
500
1000
;
run;
/*May*/
data Have_May;
input Income;
cards;
0
.
105
100
199
301
302
400
500
1000
125
305
1200
;
run;
/*Q: I want to generate a proc tabulate table like below (in fact, I have to produce 75 proc tabulate tables for huge datasets).
Income | Missing | 0 | >0-100 | >100-200 | >200-300 | >300-400 | >400-500 | >500-1000 | >1000 | sum | |
Jun | COUNT | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 10 | ||
PERCENT | 0.0% | 10.0% | 20.0% | 20.0% | 20.0% | 10.0% | 10.0% | 10.0% | |||
May | COUNT | 1 | 1 | 1 | 3 | 4 | 1 | 1 | 1 | 13 | |
PERCENT | 7.69% | 7.69% | 7.69% | 23.08% | 0.00% | 30.77% | 7.69% | 7.69% | 7.69% | ||
Diff | -7.69% | 2.31% | 12.31% | -3.08% | 20.00% | -20.77% | 2.31% | 2.31% | -7.69% |
This table simply presents counts of subjects in each income category for June and May separately.
And it presnets total sums separately.
And it presents corresponding perecentages separately.
And it presents difference in percentages between Jun and May in each income category.
/*I have done up to this but I cannot handle the proc trabulate part. Would appreciate any help?*/
/*created a format*/
proc format ;
value val
.= 'Missing'
low-<0 = '<0'
0 = '0'
0<-100 = '>0-100'
100<-200 = '>100-200'
200<-300 = '>200-300'
300<-400 = '>300-400'
400<-500 = '>400-500'
500<-1000 = '>500-1000'
1000<-high = '>1000';
run;
/*applied the format for Jun*/
data need_jun;
set have_jun;
income_category = put (income, val.);
Flag='Jun';
run;
/*applied the format for May*/
data need_may;
set have_may;
income_category = put (income, val.);
Flag = 'May';
run;
/*concatenate 2 tables together*/
data want;
set need_may need_jun;
run;
/*Apply a proc tabulate*/
PROC TABULATE DATA=want;
VAR income;
class income_category flag;
TABLE flag, income_category, income*N ;
RUN;
If I was right. PROC TABULATE can't do this unstructure summary table.
/*Jun**/
data Have_Jun;
input Income;
cards;
0
100
105
100
200
300
300
400
500
1000
;
run;
/*May*/
data Have_May;
input Income;
cards;
0
.
105
100
199
301
302
400
500
1000
125
305
1200
;
run;
proc format ;
value val
.= 'Missing'
low-<0 = '<0'
0 = '0'
0<-100 = '>0-100'
100<-200 = '>100-200'
200<-300 = '>200-300'
300<-400 = '>300-400'
400<-500 = '>400-500'
500<-1000 = '>500-1000'
1000<-high = '>1000';
value fmt
-1<-<1=[percentn7.2];
run;
proc freq data=Have_Jun noprint;
table income/out=jun missing;
format income val.;
run;
proc freq data=Have_May noprint;
table income/out=may missing;
format income val.;
run;
data jun;
set jun;
retain month 'Jun ';
c_income=put(income,val.);
percent=percent*0.01;
format percent percent7.2;
drop income;
run;
data may;
set may;
retain month 'May ';
c_income=put(income,val.);
percent=percent*0.01;
format percent percent7.2;
drop income;
run;
options validvarname=any missing=0;
proc transpose data=jun out=t_jun;
by month;
var count percent;
id c_income;
idlabel c_income;
run;
proc transpose data=may out=t_may;
by month;
var count percent;
id c_income;
idlabel c_income;
run;
data dif;
set t_may(where=(_NAME_='PERCENT')) t_jun(where=(_NAME_='PERCENT')) end=last;
retain month 'Dif ';
call missing(_name_);
array x{*} _numeric_;
do i=1 to dim(x);
x{i}=sum(x{i},-lag(x{i}));
end;
if last;
drop i;
run;
data want;
retain month
'Missing'n
'<0'n
'0'n
'>0-100'n
'>100-200'n
'>200-300'n
'>300-400'n
'>400-500'n
'>500-1000'n
'>1000'n;
set t_jun t_may dif;
format _numeric_ fmt.;
drop _label_;
run;
If I was right. PROC TABULATE can't do this unstructure summary table.
/*Jun**/
data Have_Jun;
input Income;
cards;
0
100
105
100
200
300
300
400
500
1000
;
run;
/*May*/
data Have_May;
input Income;
cards;
0
.
105
100
199
301
302
400
500
1000
125
305
1200
;
run;
proc format ;
value val
.= 'Missing'
low-<0 = '<0'
0 = '0'
0<-100 = '>0-100'
100<-200 = '>100-200'
200<-300 = '>200-300'
300<-400 = '>300-400'
400<-500 = '>400-500'
500<-1000 = '>500-1000'
1000<-high = '>1000';
value fmt
-1<-<1=[percentn7.2];
run;
proc freq data=Have_Jun noprint;
table income/out=jun missing;
format income val.;
run;
proc freq data=Have_May noprint;
table income/out=may missing;
format income val.;
run;
data jun;
set jun;
retain month 'Jun ';
c_income=put(income,val.);
percent=percent*0.01;
format percent percent7.2;
drop income;
run;
data may;
set may;
retain month 'May ';
c_income=put(income,val.);
percent=percent*0.01;
format percent percent7.2;
drop income;
run;
options validvarname=any missing=0;
proc transpose data=jun out=t_jun;
by month;
var count percent;
id c_income;
idlabel c_income;
run;
proc transpose data=may out=t_may;
by month;
var count percent;
id c_income;
idlabel c_income;
run;
data dif;
set t_may(where=(_NAME_='PERCENT')) t_jun(where=(_NAME_='PERCENT')) end=last;
retain month 'Dif ';
call missing(_name_);
array x{*} _numeric_;
do i=1 to dim(x);
x{i}=sum(x{i},-lag(x{i}));
end;
if last;
drop i;
run;
data want;
retain month
'Missing'n
'<0'n
'0'n
'>0-100'n
'>100-200'n
'>200-300'n
'>300-400'n
'>400-500'n
'>500-1000'n
'>1000'n;
set t_jun t_may dif;
format _numeric_ fmt.;
drop _label_;
run;
Tabulate will not calculate the difference as shown. Tabulate does no "comparison" of any statistic
You would also want to clarify what your multiple tables may need to look like. If your time frame goes past two months what type of difference is to be shown? Every two months? First and last across the entire period?
You should also consider order of appearance of the months. You will run into issues with character values for months sorting and appearing in the correct order: Apr is before May and March in character order. And will your data ever cross calendar years? You may be better off with an actual date value to get those to work and align things.
I see this as requiring some summarization and possibly transposition to get all of the elements.
Hi Ksharp and ballardw,
Thanks very much.
Ksharp's code is generating the results I expected but in terms of a SAS data set like below.
month | Missing | <0 | 0 | >0-100 | >100-200 | >200-300 | >300-400 | >400-500 | >500-1000 | >1000 | _NAME_ |
Jun | 0 | 0 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 0 | COUNT |
Jun | 0 | 0 | 10.00% | 20.00% | 20.00% | 20.00% | 10.00% | 10.00% | 10.00% | 0 | PERCENT |
May | 1 | 0 | 1 | 1 | 3 | 4 | 1 | 1 | 1 | COUNT | |
May | 7.69% | 0 | 7.69% | 7.69% | 23.10% | 30.80% | 7.69% | 7.69% | 7.69% | PERCENT | |
Jun | -7.69% | 0 | 2.31% | 12.30% | -3.08% | 20.00% | -20.80% | 2.31% | 2.31% | -7.69% |
To present these resutls I have to export to Excel which is difficult given I have to produce and export so many two months comparison tables.
Would there be any way that I could display the same above SAS dataset resutls as a display (just like proc tabulate generate display tables) without having to export to Excel?
Thanks for your expertise.
one more thing ,change the order of variables to look the same as your report. retain month _NAME_ 'Missing'n '<0'n '0'n '>0-100'n '>100-200'n '>200-300'n '>300-400'n '>400-500'n '>500-1000'n '>1000'n; After that PROC EXPORT can export sas table into Excel file. About your question, make a macro and CALL EXECUTE() it . Like: data x; input mon $; cards; jan feb mar apr .... ; data _null_; set x; lag_mon=lag(mon); if _n_ ne 1 then call execute('your-macro-here'); run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.