Hello
I am using proc tabulate.
As you can see for each date I get another column in the report table.
As you can see in the output there are columns for :1802,1805,1806,1807,1809
However I want also to have columns for: 1803,1804,1808 with zero values...
What is the best clever way to do it please?
data rawdata;
input ID Ddate Ind Y;
cards;
1 1806 1 100
1 1807 1 200
1 1809 1 300
2 1809 1 400
2 1807 1 500
2 1806 2 600
2 1806 2 700
3 1805 3 800
3 1805 3 900
4 1805 3 100
5 1802 3 110
6 1802 3 120
7 1802 3 130
;
run;
PROC TABULATE DATA=rawdata format=comma32.;
var Y;
CLASS Ind / ORDER=UNFORMATTED MISSING;
CLASS Ddate / ORDER=UNFORMATTED MISSING;
TABLE Ind="" ALL={LABEL="Total"},
Ddate=''*Y=''*N=''*f=comma32.1
Y=''*N='Total'
/box='Indicator';
RUN;
Use CLASSDATA= option.
data rawdata;
input ID Ddate Ind Y;
cards;
1 1806 1 100
1 1807 1 200
1 1809 1 300
2 1809 1 400
2 1807 1 500
2 1806 2 600
2 1806 2 700
3 1805 3 800
3 1805 3 900
4 1805 3 100
5 1802 3 110
6 1802 3 120
7 1802 3 130
;
run;
data level;
do ind=1 to 7;
do ddate=1802 to 1809;
output;
end;
end;
run;
PROC TABULATE DATA=rawdata classdata=level format=comma32.;
var Y;
CLASS Ind / ORDER=UNFORMATTED MISSING;
CLASS Ddate / ORDER=UNFORMATTED MISSING;
TABLE Ind="" ALL={LABEL="Total"},
Ddate=''*Y=''*N=''*f=comma32.1
Y=''*N='Total'
/box='Indicator';
RUN;
Well, coding ALL IN UPPERCASE is not a smart way to start as I cannot read that shouting.
The question however has been asked many times, just the other day for instance:
The approach for Tabulate usually involves a Format and assigning the PRELOADFMT option to the appropriate Class variable. The table option PRINTMISS. You may also want to use ORDER=DATA with the class variable. Note that you will have to create the custom format as we do not know all of the actual values that you may want to display.
May you please send a code that is working for dates 1802-1809 (1802,1803,1804,1805,1806,1807,1808,1809)
thanks
Hi @Ronein
Here is a piece of code that does what you want. My personal opinion is that it is a better idea to be content with the possibilities given in standard procedures, because going beyond that takes a lot extra coding that is difficult to reuse, maintain and document, but it can be done.
data rawdata;
input ID Ddate Ind Y;
cards;
1 1806 1 100
1 1807 1 200
1 1809 1 300
2 1809 1 400
2 1807 1 500
2 1806 2 600
2 1806 2 700
3 1805 3 800
3 1805 3 900
4 1805 3 100
5 1802 3 110
6 1802 3 120
7 1802 3 130
;
run;
/* First first value of Ind + min and max Ddates and put them in macro variables */
proc sql noprint;
select min(Ind), min(Ddate), max(Ddate) into :MinInd, :MinDdate, :MaxDdate
from rawdata;
quit;
/* Then create a work dataset for one value of Ind with all Ddates */
data w;
Ind = &MinInd;
do Ddate = &MinDdate to &MaxDdate;
output;
end;
run;
/* Join on rawdata, so first value of Ind has all values of Ddate with missing Y
in Ddates not coming from rawdata */
proc sql;
create table want as
select
coalesce(rawdata.Ind, w.Ind) as Ind,
coalesce(rawdata.Ddate, w.Ddate) as Ddate,
rawdata.Y
from rawdata full outer join w
on
rawdata.Ind = w.Ind
and rawdata.Ddate = w.Ddate;
quit;
/* The we need a format to represent 0 as missing, because Proc Tabulate will
show 0.0 in cells where a Ddate is present even if Y is missing */
proc format;
value zeromiss
0 = .
other = [comma32.1]
;
run;
/* One change to your Proc tabulate - use new format instead of comma32.1 */
PROC TABULATE DATA=want format=comma32.;
var Y;
CLASS Ind / ORDER=UNFORMATTED MISSING;
CLASS Ddate / ORDER=UNFORMATTED MISSING;
TABLE Ind="" ALL={LABEL="Total"},
Ddate=''*Y=''*N=''*f=zeromiss.
Y=''*N='Total'
/box='Indicator';
RUN;
Output:
Hello and thank you so much.
It is a very nice solution but unfortunately I don't get the desired output.
Can you please look on the following example.
For the summary report that sum Y I get correct outcome (Total 210)
But for the summary report that count number of observations I get wrong outcome ( I get total of 19 but I need to get total of 6 )
Data aaa;
input ID ddate Ind Y;
cards;
1 1901 1 10
1 1811 1 20
1 1811 1 30
2 1901 2 40
2 1812 1 50
3 1812 3 60
;
Run
%let MinDdate=1808;
%let MaxDdate=1901;
%put &MinDdate;
%put &MaxDdate;
%let date_MinDdate=%sysfunc(inputn(&MinDdate.,yymmn4.));
%let date_MaxDdate=%sysfunc(inputn(&MaxDdate.,yymmn4.));
%put &date_MinDdate. ;
%put &date_MaxDdate. ;
data _null_;
No=intck('month',&date_MinDdate.,&date_MaxDdate.);
call symput("No",trim(left(No)));
run;
%put &No;
%macro months_Ind_Macro;
%do i=0 %to &No.;
%do j=1 %to 3;
Ddate=put(intnx('month',&date_MinDdate.,&i.),yymmn4.)*1;
ind =&j.;
output;
%end;
%end;
%mend;
data Sheled;
%months_Ind_Macro;
run;
proc sql;
create table want as
select ID ,
coalesce(a.Ddate, b.Ddate) as Ddate,
coalesce(a.Ind, b.Ind) as Ind,
coalesce(a.Y,0) as Y
from aaa as a
full outer join Sheled as b
on a.Ind = b.Ind and a.Ddate = b.Ddate
order by calculated Ddate,calculated Ind
;
quit;
/* proc print data=aaa noobs;run;*/
/* proc print data=want noobs;run;*/
proc format;
value zeromiss
0 = .
other = [comma32.]
;
run;
/*wrong OUTCOME*/
PROC TABULATE DATA=want ;
var Y;
CLASS Ind / ORDER=UNFORMATTED MISSING;
CLASS Ddate / ORDER=UNFORMATTED MISSING;
TABLE Ind="" ALL={LABEL="Total"},
Ddate=''*Y=''*N=''*f=zeromiss.
Y=''*N='Total'
/box='Indicator';
RUN;
/*Correct OUTCOME*/
PROC TABULATE DATA=want ;
var Y;
CLASS Ind / ORDER=UNFORMATTED MISSING;
CLASS Ddate / ORDER=UNFORMATTED MISSING;
TABLE Ind="" ALL={LABEL="Total"},
Ddate=''*Y=''*SUM=''*f=zeromiss.
Y=''*SUM='Total'
/box='Indicator';
RUN;
Use CLASSDATA= option.
data rawdata;
input ID Ddate Ind Y;
cards;
1 1806 1 100
1 1807 1 200
1 1809 1 300
2 1809 1 400
2 1807 1 500
2 1806 2 600
2 1806 2 700
3 1805 3 800
3 1805 3 900
4 1805 3 100
5 1802 3 110
6 1802 3 120
7 1802 3 130
;
run;
data level;
do ind=1 to 7;
do ddate=1802 to 1809;
output;
end;
end;
run;
PROC TABULATE DATA=rawdata classdata=level format=comma32.;
var Y;
CLASS Ind / ORDER=UNFORMATTED MISSING;
CLASS Ddate / ORDER=UNFORMATTED MISSING;
TABLE Ind="" ALL={LABEL="Total"},
Ddate=''*Y=''*N=''*f=comma32.1
Y=''*N='Total'
/box='Indicator';
RUN;
Thank you very much.
In the code that you sent there is force to use al categories of dates from 1802 to 1809.
What will you do if you want also to force Ind values 1 to 4.
Can you use classdata statement for two categorical fields?
@Ronein wrote:
Thank you very much.
In the code that you sent there is force to use al categories of dates from 1802 to 1809.
What will you do if you want also to force Ind values 1 to 4.
Can you use classdata statement for two categorical fields?
specifies a data set that contains the combinations of values of the class variables that must be present in the output. Any combinations of values of the class variables that occur in the CLASSDATA= data set but not in the input data set appear in each table or output data set and have a frequency of zero.
Restriction | The CLASSDATA= data set must contain all class variables. Their data type and format must match the corresponding class variables in the input data set. |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.