- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I am using proc tabulate to create a summary table.
I want to display also categories that have no values in the raw data .
In this example I want to see also category "0<-0.5" that has no values in the raw data.
For this task I am using classdata statement.
However, I get an error " Class variable Y_Category was not found in the preload data set or was incompatible with the primary data set variable of the same name.*
Data rawdata;
Input ID Y;
cards;
1 12000
2 0
3 0.6
4 0.7
5 19
6 90
7 200
8 400
9 800
10 2000
11 4000
12 9000
13 0
;
run;
proc format;
value FFmt
0='0'
0<-0.5='0<-0.5'
0.5<-20='0.5<-20'
20<-10000='5000<-10000'
10000-high='10000+';
Run;
data tbl2;
set rawdata;
Y_Category=put(Y,FFmt.);
format Y_Category $20.;
Run;
Data level;
Input Y_Category $20.;
cards;
0
0<-0.5
5000<-10000
10000+
;
Run;
proc sort data=tbl2;by y;run;
proc tabulate data=tbl2 classdata=level ;
var Y;
class Y_Category/ORDER=data MISSING;
table Y_Category='',Y=''*N='Customers'
Y=''*PCTN='PCT from Total Customers'
Y=''*SUM='Sum Y'
Y=''*COLPCTSUM='PCT from total Y' /RTS=25 PRINTMISS BOX='category of Y' CONDENSE;
Run;
/*ERROR: Class variable Y_Category was not found in the preload data set or was incompatible with the primary data set variable of
the same name.*/
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maxim 3: Know Your Data.
Look at Y_category in tbl2 and in level.
Since the maximum string length in your format is 11, Y_Category is defined in tbl2 with that length (the format does NOT change that!)
Reading it with a length of 20 for level causes the incompatibility.
Optimize your two data steps:
data tbl2;
set rawdata;
Y_Category = put(Y,FFmt.);
run;
data level;
input Y_Category $11.;
cards;
0
0<-0.5
5000<-10000
10000+
;
run;
and the following code will run without problems:
27 data tbl2; 28 set rawdata; 29 Y_Category = put(Y,FFmt.); 30 run; NOTE: There were 13 observations read from the data set WORK.RAWDATA. NOTE: The data set WORK.TBL2 has 13 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.01 seconds 31 32 data level; 33 input Y_Category $11.; 34 cards; NOTE: The data set WORK.LEVEL has 4 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.00 seconds 39 ; 40 run; 41 42 proc sort data=tbl2; 43 by y; 44 run; NOTE: There were 13 observations read from the data set WORK.TBL2. 2 Das SAS System 07:43 Monday, March 11, 2019 NOTE: The data set WORK.TBL2 has 13 observations and 3 variables. NOTE: PROZEDUR SORT used (Total process time): real time 0.04 seconds cpu time 0.00 seconds 45 46 proc tabulate data=tbl2 classdata=level ; 47 var Y; 48 class Y_Category / order=data missing; 49 table Y_Category='',Y=''*N='Customers' 50 Y=''*PCTN='PCT from Total Customers' 51 Y=''*SUM='Sum Y' 52 Y=''*COLPCTSUM='PCT from total Y' / rts=25 printmiss box='category of Y' condense; 53 run; NOTE: There were 13 observations read from the data set WORK.TBL2. NOTE: There were 4 observations read from the data set WORK.LEVEL. NOTE: The PROCEDURE TABULATE printed page 1. NOTE: PROZEDUR TABULATE used (Total process time): real time 0.12 seconds cpu time 0.01 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is the result of your posted code:
----------------------------------------------------------------------------- |category of Y | | PCT from | | | | | | Total | | PCT from | | | Customers | Customers | Sum Y | total Y | |-----------------------+------------+------------+------------+------------| |0 | 2.00| 15.38| 0.00| 0.00| |-----------------------+------------+------------+------------+------------| |0<-0.5 | 0.00| 0.00| .| .| |-----------------------+------------+------------+------------+------------| |5000<-10000 | 7.00| 53.85| 16490.00| 57.84| |-----------------------+------------+------------+------------+------------| |10000+ | 1.00| 7.69| 12000.00| 42.09| |-----------------------+------------+------------+------------+------------| |0.5<-20 | 3.00| 23.08| 20.30| 0.07| -----------------------------------------------------------------------------
Ignoring the strange sort-order (caused by using a alphanumeric variable), what's the problem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When I run the proc tabulate I get an error "Class variable Y_Category was not found in the preload data set or was incompatible with the primary data set variable of the same name"
proc tabulate data=tbl2 classdata=level ;
var Y;
class Y_Category/ORDER=data MISSING;
table Y_Category='',Y=''*N='Customers'
Y=''*PCTN='PCT from Total Customers'
Y=''*SUM='Sum Y'
Y=''*COLPCTSUM='PCT from total Y' /RTS=25 PRINTMISS BOX='category of Y' CONDENSE;
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My fault, i automatically removed the format-statement in the data step creating tbl2.
You don't need classdata at all, just use preloadfmt:
data rawdata;
Input ID Y;
cards;
1 12000
2 0
3 0.6
4 0.7
5 19
6 90
7 200
8 400
9 800
10 2000
11 4000
12 9000
13 0
;
run;
proc format;
value FFmt
0='0'
0<-0.5='0<-0.5'
0.5<-20='0.5<-20'
20<-10000='5000<-10000'
10000-high='10000+';
run;
data work.cloned;
set rawdata;
Y_Category = y;
format y_category ffmt.;
run;
proc tabulate data=cloned missing;
var Y;
class Y_Category / order=data preloadfmt missing ;
table Y_Category='',Y=''*N='Customers'
Y=''*PCTN='PCT from Total Customers'
Y=''*SUM='Sum Y'
Y=''*COLPCTSUM='PCT from total Y' /RTS=25 PRINTMISS BOX='category of Y' CONDENSE;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@andreas_lds wrote:
You don't need classdata at all, just use preloadfmt:
Which is the better option, anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
perfect
I also want to ask Why "Sum Y" and "PCT from total Y" in category "0<-0.5" is getting null value (.) and not 0??
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ronein wrote:
perfect
I also want to ask Why "Sum Y" and "PCT from total Y" in category "0<-0.5" is getting null value (.) and not 0??
Because no entries are found, the summation ends up as sum(.), and that creates a missing value.
For reference, try this:
data test;
x1 = sum(.);
x2 = sum(0,.);
run;
You need at least 1 non-missing value for the summation to be non-missing. Counts, OTOH, create a zero.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maxim 3: Know Your Data.
Look at Y_category in tbl2 and in level.
Since the maximum string length in your format is 11, Y_Category is defined in tbl2 with that length (the format does NOT change that!)
Reading it with a length of 20 for level causes the incompatibility.
Optimize your two data steps:
data tbl2;
set rawdata;
Y_Category = put(Y,FFmt.);
run;
data level;
input Y_Category $11.;
cards;
0
0<-0.5
5000<-10000
10000+
;
run;
and the following code will run without problems:
27 data tbl2; 28 set rawdata; 29 Y_Category = put(Y,FFmt.); 30 run; NOTE: There were 13 observations read from the data set WORK.RAWDATA. NOTE: The data set WORK.TBL2 has 13 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.01 seconds 31 32 data level; 33 input Y_Category $11.; 34 cards; NOTE: The data set WORK.LEVEL has 4 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.00 seconds 39 ; 40 run; 41 42 proc sort data=tbl2; 43 by y; 44 run; NOTE: There were 13 observations read from the data set WORK.TBL2. 2 Das SAS System 07:43 Monday, March 11, 2019 NOTE: The data set WORK.TBL2 has 13 observations and 3 variables. NOTE: PROZEDUR SORT used (Total process time): real time 0.04 seconds cpu time 0.00 seconds 45 46 proc tabulate data=tbl2 classdata=level ; 47 var Y; 48 class Y_Category / order=data missing; 49 table Y_Category='',Y=''*N='Customers' 50 Y=''*PCTN='PCT from Total Customers' 51 Y=''*SUM='Sum Y' 52 Y=''*COLPCTSUM='PCT from total Y' / rts=25 printmiss box='category of Y' condense; 53 run; NOTE: There were 13 observations read from the data set WORK.TBL2. NOTE: There were 4 observations read from the data set WORK.LEVEL. NOTE: The PROCEDURE TABULATE printed page 1. NOTE: PROZEDUR TABULATE used (Total process time): real time 0.12 seconds cpu time 0.01 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perfect!
Thank you.
Two more question please.
I see that the order of the categories in the output is not as I want .
I want it to be in the following order:
0
0<-0.5
5000<-10000
10000+
What should I do in order to get it?
I also want to ask Why "Sum Y" and "PCT from total Y" in category "0<-0.5" is getting null value (.) and not 0??
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Make your dataset LEVEL complete. As posted, it only has 4 observations, while the format has 5 entries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry. I don't understand what you mean
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ronein wrote:
Sorry. I don't understand what you mean
Compare your original proc format code (5 entries) with your data step code for dataset level (4 entries). The 5th category from the format which is not found in level is appended at the end of the tabulate output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Still better: use the preloadfmt option, and have a separate format for your class variable that you use in the preloadfmt:
data rawdata;
input ID Y;
cards;
1 12000
2 0
3 0.6
4 0.7
5 19
6 90
7 200
8 400
9 800
10 2000
11 4000
12 9000
13 0
;
run;
proc format;
value category
0="0"
0<-0.5="1"
0.5<-20="2"
20<-10000="3"
10000-high="4"
;
value FFmt
0='0'
1='0<-0.5'
2='0.5<-20'
3='5000<-10000'
4='10000+'
;
run;
data tbl2;
set rawdata;
Y_Category = input(put(Y,category.),1.);
format y_category ffmt.;
run;
proc tabulate data=tbl2;
var Y;
class Y_Category / order=data preloadfmt;
table
Y_Category='',
Y=''*N='Customers'
Y=''*PCTN='PCT from Total Customers'
Y=''*SUM='Sum Y'
Y=''*COLPCTSUM='PCT from total Y'
/ rts=25 printmiss box='category of Y' condense
;
run;
The first format does the conversion, the second serves to create the complete list.
Result:
----------------------------------------------------------------------------- |category of Y | | PCT from | | | | | | Total | | PCT from | | | Customers | Customers | Sum Y | total Y | |-----------------------+------------+------------+------------+------------| |0 | 2.00| 15.38| 0.00| 0.00| |-----------------------+------------+------------+------------+------------| |0<-0.5 | 0.00| 0.00| .| .| |-----------------------+------------+------------+------------+------------| |0.5<-20 | 3.00| 23.08| 20.30| 0.07| |-----------------------+------------+------------+------------+------------| |5000<-10000 | 7.00| 53.85| 16490.00| 57.84| |-----------------------+------------+------------+------------+------------| |10000+ | 1.00| 7.69| 12000.00| 42.09| -----------------------------------------------------------------------------
Because you use class (and not by), no sorting is needed
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much.