Hi everyone, I am needing some help creating a custom percentage in proc tabulate. Using sas 9.4. New here so will do my best to describe and please let me know how I can clarify.
I am using record level data with the following columns: individual's id#, year of birth, group#, ethnicity, Degree, Grads.
Degree is categorial- as either MA or BA (I created this based on numerical values I had and could pull in the numerical values if needed- as the grade number completed)
Heres what I am working with and some of a small portion of the data:
data graduation;
input ID Year Group$ ethnicity degree$ grads;
cards;
1 1995 A 1 MA 1
2 1995 B 2 BA 1
3 1995 A 3 MA 1
4 1995 A 1 MA 1
5 1995 A 3 BA 1
6 1995 C 2 BA 1
7 1995 A 2 BA 1
8 1995 B 1 MA 1
9 1995 B 1 MA 1
10 1995 A 1 BA 1
11 1995 B 3 BA 1
12 1995 A 3 BA 1
13 1995 A 3 BA 1
14 1995 B 2 MA 1
15 1995 C 1 BA 1
16 1995 A 2 BA 1
17 1995 A 3 BA 1
18 1995 A 2 MA 1
19 1995 A 2 MA 1
20 1995 A 2 MA 1
21 1995 A 3 BA 1
22 1995 B 2 MA 1
23 1995 C 1 BA 1
24 1995 C 2 BA 1
25 1995 B 3 BA 1
26 1995 C 1 BA 1
27 1995 B 2 BA 1
28 1995 B 2 BA 1
;
proc format;
value ethn
1= 'White'
2= 'Black'
3= 'Hispanic'
;
run;
proc tabulate data= graduation ;
class group ethnicity degree;
var grads;
table group, ethnicity*(degree*(n pctn<grads>) grads);
format ethnicity ethn.;
run;
Below is the table i'm getting. This is the format I want. But the percents are not correct since you can see they are all 100. Instead, I would like the percent as the number with MA degree out of the Grads sum, within each ethnicity, and similarly for BA. For example in the highlighted, it should be (16*100)/70. I have tried to do colpctn and rowpctn but have been unsuccessful.
Thanks in advance.
One of the reasons I said data can be critical. With nothing to test it is hard to get the order of things correct.
Try this:
proc tabulate data= graduation ; class group ethnicity degree; VAR grads; table group*GRADS=' ', ethnicity* (degree*(n PctSUM<grads>='%') sum='Grads'*f=5.) /misstext=' ' row=float; format ethnicity ethn.; run;
Getting the right variables in the right dimension. The "without analysis variable" was because of Group trying to use Pctsum<grads> but grads hadn't be set to work with group.
The bits you may not expect are the =' ' to modify (suppress) variable or statistic labels, the *f=5. set a format that doesn't display .00 for the sum. The Misstext option suppresses cells with . (missing) and row=float cleans up the row labels when the Grad label is suppressed. Otherwise there would be an apparent column of the table with no text.
Pctn has a similar issue about which dimension the denominator appears in.
These are valid but likely not useful: (as well as showing why N may not be a good statistic for Grads, you would get a column per value)
proc tabulate data= graduation ; class group ethnicity degree; class grads; table group, ethnicity* (degree*(n pctn<group>) grads) /misstext=' ' row=float; table group, ethnicity* (degree*(n pctn<degree>) grads) /misstext=' ' row=float; table group, ethnicity* (degree*(n pctn<ethnicity>) grads) /misstext=' ' row=float; format ethnicity ethn.; run;
Data?
Since you have defined GRADS as a VAR variable you may want PCTSUM<grads>.
Maybe. Depends on the values of the GRADS variable. Or possibly you need to make Grads a class variable.
One clue is that the statistic for Grads by itself is SUM, which is the default for Var variables when specific statistics are not requested.
The content of the data is important for calculating percentages.
Edited my post to include some of the data.
So I tried pctsum but am getting errors. I tried leaving grads as var and also tried changing to a class variable. For both I got the following error:
ERROR: Statistic other than N was requested without analysis variable in the following nesting :
Group * ethnicity * degree * PctSum.
/*Pctsum*/
/*leave grads as var variable*/
proc tabulate data= graduation ;
class group ethnicity degree;
var grads;
table group, ethnicity*(degree*(n pctsum<grads>) grads);
format ethnicity ethn.;
run;
/*try grads as class variable*/
proc tabulate data= graduation ;
class group ethnicity degree grads;
table group, ethnicity*(degree*(n pctsum<grads>) grads);
format ethnicity ethn.;
run;
One of the reasons I said data can be critical. With nothing to test it is hard to get the order of things correct.
Try this:
proc tabulate data= graduation ; class group ethnicity degree; VAR grads; table group*GRADS=' ', ethnicity* (degree*(n PctSUM<grads>='%') sum='Grads'*f=5.) /misstext=' ' row=float; format ethnicity ethn.; run;
Getting the right variables in the right dimension. The "without analysis variable" was because of Group trying to use Pctsum<grads> but grads hadn't be set to work with group.
The bits you may not expect are the =' ' to modify (suppress) variable or statistic labels, the *f=5. set a format that doesn't display .00 for the sum. The Misstext option suppresses cells with . (missing) and row=float cleans up the row labels when the Grad label is suppressed. Otherwise there would be an apparent column of the table with no text.
Pctn has a similar issue about which dimension the denominator appears in.
These are valid but likely not useful: (as well as showing why N may not be a good statistic for Grads, you would get a column per value)
proc tabulate data= graduation ; class group ethnicity degree; class grads; table group, ethnicity* (degree*(n pctn<group>) grads) /misstext=' ' row=float; table group, ethnicity* (degree*(n pctn<degree>) grads) /misstext=' ' row=float; table group, ethnicity* (degree*(n pctn<ethnicity>) grads) /misstext=' ' row=float; format ethnicity ethn.; run;
Thank you so much for your help! I really appreciate all the explanations so I can learn why you recommended certain suggestions.
The following is was what ended up providing me with what I need.
proc tabulate data= graduation ;
class group ethnicity degree;
class grads;
table group,
ethnicity* (degree*(n pctn<degree>) grads)
/misstext=' ' row=float;
format ethnicity ethn.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.