BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I am using proc tabulate to create a summary table.

I want to display also categories that have no values in the raw data .

In this example I want to see also category "0<-0.5" that has no values in the raw data.

For this task I am using classdata statement.

However, I get an error  " Class variable Y_Category was not found in the preload data set or was incompatible with the primary data set variable of the same name.*


Data rawdata;
Input ID Y;
cards;
1 12000
2 0
3 0.6
4 0.7
5 19
6 90
7 200
8 400
9 800
10 2000
11 4000
12 9000
13 0
;
run;

proc format;
value FFmt
0='0'
0<-0.5='0<-0.5'
0.5<-20='0.5<-20'
20<-10000='5000<-10000'
10000-high='10000+';
Run;

data tbl2;
set rawdata;
Y_Category=put(Y,FFmt.);
format Y_Category $20.;
Run;
 

Data level;
Input Y_Category $20.;
cards;
0
0<-0.5
5000<-10000
10000+
;
Run;

proc sort data=tbl2;by y;run;
 proc tabulate data=tbl2  classdata=level ;
var Y;
class Y_Category/ORDER=data  MISSING;
table Y_Category='',Y=''*N='Customers'
            Y=''*PCTN='PCT from Total Customers'
			Y=''*SUM='Sum Y'
            Y=''*COLPCTSUM='PCT from total Y' /RTS=25 PRINTMISS BOX='category of Y' CONDENSE;
Run;

 /*ERROR: Class variable Y_Category was not found in the preload data set or was incompatible with the primary data set variable of 
       the same name.*/
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Maxim 3: Know Your Data.

 

Look at Y_category in tbl2 and in level.

Since the maximum string length in your format is 11, Y_Category is defined in tbl2 with that length (the format does NOT change that!)

Reading it with a length of 20 for level causes the incompatibility.

Optimize your two data steps:

data tbl2;
set rawdata;
Y_Category = put(Y,FFmt.);
run;
 
data level;
input Y_Category $11.;
cards;
0
0<-0.5
5000<-10000
10000+
;
run;

and the following code will run without problems:

27         data tbl2;
28         set rawdata;
29         Y_Category = put(Y,FFmt.);
30         run;

NOTE: There were 13 observations read from the data set WORK.RAWDATA.
NOTE: The data set WORK.TBL2 has 13 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds
      

31         
32         data level;
33         input Y_Category $11.;
34         cards;

NOTE: The data set WORK.LEVEL has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.00 seconds
      
39         ;

40         run;
41         
42         proc sort data=tbl2;
43         by y;
44         run;

NOTE: There were 13 observations read from the data set WORK.TBL2.
2                                                          Das SAS System                               07:43 Monday, March 11, 2019

NOTE: The data set WORK.TBL2 has 13 observations and 3 variables.
NOTE: PROZEDUR SORT used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds
      

45         
46         proc tabulate data=tbl2 classdata=level ;
47         var Y;
48         class Y_Category / order=data missing;
49         table Y_Category='',Y=''*N='Customers'
50               Y=''*PCTN='PCT from Total Customers'
51         			Y=''*SUM='Sum Y'
52               Y=''*COLPCTSUM='PCT from total Y' / rts=25 printmiss box='category of Y' condense;
53         run;

NOTE: There were 13 observations read from the data set WORK.TBL2.
NOTE: There were 4 observations read from the data set WORK.LEVEL.
NOTE: The PROCEDURE TABULATE printed page 1.
NOTE: PROZEDUR TABULATE used (Total process time):
      real time           0.12 seconds
      cpu time            0.01 seconds

View solution in original post

13 REPLIES 13
andreas_lds
Jade | Level 19

This is the result of your posted code:

-----------------------------------------------------------------------------
|category of Y          |            |  PCT from  |            |            |
|                       |            |   Total    |            |  PCT from  |
|                       | Customers  | Customers  |   Sum Y    |  total Y   |
|-----------------------+------------+------------+------------+------------|
|0                      |        2.00|       15.38|        0.00|        0.00|
|-----------------------+------------+------------+------------+------------|
|0<-0.5                 |        0.00|        0.00|           .|           .|
|-----------------------+------------+------------+------------+------------|
|5000<-10000            |        7.00|       53.85|    16490.00|       57.84|
|-----------------------+------------+------------+------------+------------|
|10000+                 |        1.00|        7.69|    12000.00|       42.09|
|-----------------------+------------+------------+------------+------------|
|0.5<-20                |        3.00|       23.08|       20.30|        0.07|
-----------------------------------------------------------------------------

Ignoring the strange sort-order (caused by using a alphanumeric variable), what's the problem?

 

 

Ronein
Meteorite | Level 14

When I run the proc tabulate I get an error "Class variable Y_Category was not found in the preload data set or was incompatible with the primary data set variable of the same name"

 proc tabulate data=tbl2  classdata=level ;
var Y;
class Y_Category/ORDER=data  MISSING;
table Y_Category='',Y=''*N='Customers'
            Y=''*PCTN='PCT from Total Customers'
			Y=''*SUM='Sum Y'
            Y=''*COLPCTSUM='PCT from total Y' /RTS=25 PRINTMISS BOX='category of Y' CONDENSE;
Run;
andreas_lds
Jade | Level 19

My fault, i automatically removed the format-statement in the data step creating tbl2.

 

You don't need classdata at all, just use preloadfmt:

 

data rawdata;
   Input ID Y;
   cards;
1 12000
2 0
3 0.6
4 0.7
5 19
6 90
7 200
8 400
9 800
10 2000
11 4000
12 9000
13 0
;
run;

proc format;
   value FFmt
      0='0'
      0<-0.5='0<-0.5'
      0.5<-20='0.5<-20'
      20<-10000='5000<-10000'
      10000-high='10000+';
run;

data work.cloned;
   set rawdata;

   Y_Category = y;

   format y_category ffmt.;

run;


proc tabulate data=cloned missing;
   var Y;
   class Y_Category / order=data preloadfmt missing ;
   
   table Y_Category='',Y=''*N='Customers'
      Y=''*PCTN='PCT from Total Customers'
      Y=''*SUM='Sum Y'
      Y=''*COLPCTSUM='PCT from total Y' /RTS=25 PRINTMISS BOX='category of Y' CONDENSE;
run;
Ronein
Meteorite | Level 14

perfect

I also want to ask Why "Sum Y" and "PCT from total Y" in category "0<-0.5" is getting null value (.) and not 0??

Kurt_Bremser
Super User

@Ronein wrote:

perfect

I also want to ask Why "Sum Y" and "PCT from total Y" in category "0<-0.5" is getting null value (.) and not 0??


Because no entries are found, the summation ends up as sum(.), and that creates a missing value.

For reference, try this:

data test;
x1 = sum(.);
x2 = sum(0,.);
run;

You need at least 1 non-missing value for the summation to be non-missing. Counts, OTOH, create a zero.

Kurt_Bremser
Super User

Maxim 3: Know Your Data.

 

Look at Y_category in tbl2 and in level.

Since the maximum string length in your format is 11, Y_Category is defined in tbl2 with that length (the format does NOT change that!)

Reading it with a length of 20 for level causes the incompatibility.

Optimize your two data steps:

data tbl2;
set rawdata;
Y_Category = put(Y,FFmt.);
run;
 
data level;
input Y_Category $11.;
cards;
0
0<-0.5
5000<-10000
10000+
;
run;

and the following code will run without problems:

27         data tbl2;
28         set rawdata;
29         Y_Category = put(Y,FFmt.);
30         run;

NOTE: There were 13 observations read from the data set WORK.RAWDATA.
NOTE: The data set WORK.TBL2 has 13 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds
      

31         
32         data level;
33         input Y_Category $11.;
34         cards;

NOTE: The data set WORK.LEVEL has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.00 seconds
      
39         ;

40         run;
41         
42         proc sort data=tbl2;
43         by y;
44         run;

NOTE: There were 13 observations read from the data set WORK.TBL2.
2                                                          Das SAS System                               07:43 Monday, March 11, 2019

NOTE: The data set WORK.TBL2 has 13 observations and 3 variables.
NOTE: PROZEDUR SORT used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds
      

45         
46         proc tabulate data=tbl2 classdata=level ;
47         var Y;
48         class Y_Category / order=data missing;
49         table Y_Category='',Y=''*N='Customers'
50               Y=''*PCTN='PCT from Total Customers'
51         			Y=''*SUM='Sum Y'
52               Y=''*COLPCTSUM='PCT from total Y' / rts=25 printmiss box='category of Y' condense;
53         run;

NOTE: There were 13 observations read from the data set WORK.TBL2.
NOTE: There were 4 observations read from the data set WORK.LEVEL.
NOTE: The PROCEDURE TABULATE printed page 1.
NOTE: PROZEDUR TABULATE used (Total process time):
      real time           0.12 seconds
      cpu time            0.01 seconds
Ronein
Meteorite | Level 14

Perfect!

Thank you.

Two more question please.

 

I see that the order of the categories in the output is not as I want .

I want it to be in the following order:

0
0<-0.5
5000<-10000
10000+

What should I do in order to get it?

 

I also want to ask Why "Sum Y" and "PCT from total Y" in category "0<-0.5" is getting null value (.) and not 0??

 

 

Ronein
Meteorite | Level 14

Sorry. I don't understand what you mean

Kurt_Bremser
Super User

@Ronein wrote:

Sorry. I don't understand what you mean


Compare your original proc format code (5 entries) with your data step code for dataset level (4 entries). The 5th category from the format which is not found in level is appended at the end of the tabulate output.

Kurt_Bremser
Super User

Still better: use the preloadfmt option, and have a separate format for your class variable that you use in the preloadfmt:

data rawdata;
input ID Y;
cards;
1 12000
2 0
3 0.6
4 0.7
5 19
6 90
7 200
8 400
9 800
10 2000
11 4000
12 9000
13 0
;
run;

proc format;
value category
  0="0"
  0<-0.5="1"
  0.5<-20="2"
  20<-10000="3"
  10000-high="4"
;
value FFmt
  0='0'
  1='0<-0.5'
  2='0.5<-20'
  3='5000<-10000'
  4='10000+'
;
run;

data tbl2;
set rawdata;
Y_Category = input(put(Y,category.),1.);
format y_category ffmt.;
run;

proc tabulate data=tbl2;
var Y;
class Y_Category / order=data preloadfmt;
table
  Y_Category='',
  Y=''*N='Customers'
  Y=''*PCTN='PCT from Total Customers'
	Y=''*SUM='Sum Y'
  Y=''*COLPCTSUM='PCT from total Y'
  / rts=25 printmiss box='category of Y' condense
;
run;

The first format does the conversion, the second serves to create the complete list.

Result:

-----------------------------------------------------------------------------
|category of Y          |            |  PCT from  |            |            |
|                       |            |   Total    |            |  PCT from  |
|                       | Customers  | Customers  |   Sum Y    |  total Y   |
|-----------------------+------------+------------+------------+------------|
|0                      |        2.00|       15.38|        0.00|        0.00|
|-----------------------+------------+------------+------------+------------|
|0<-0.5                 |        0.00|        0.00|           .|           .|
|-----------------------+------------+------------+------------+------------|
|0.5<-20                |        3.00|       23.08|       20.30|        0.07|
|-----------------------+------------+------------+------------+------------|
|5000<-10000            |        7.00|       53.85|    16490.00|       57.84|
|-----------------------+------------+------------+------------+------------|
|10000+                 |        1.00|        7.69|    12000.00|       42.09|
-----------------------------------------------------------------------------

Because you use class (and not by), no sorting is needed

Ronein
Meteorite | Level 14

Thank you so much.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3517 views
  • 4 likes
  • 3 in conversation