BookmarkSubscribeRSS Feed
Sheril1
Calcite | Level 5

I have a file. Columns of each student's ID card, city of residence, educational institution, column on whether or not he dropped out of school. If we drop then a column gets 1 if we don't drop then a column gets zero.

For example 

Sheril1_0-1727277751505.png

I want to prepare a table with a calculation of how many students there are in each group of cities by type of institution. How many students dropped out by city and type of institution. And what is the percentage of dropouts by city and type of institution. I want to prepare the table using SAS with the TABULATE function
Example of results 

Sheril1_1-1727277779279.png

 

The goal is to calculate the number and percentage of dropouts without also showing the non-dropouts. That is, not to show the complements. I was able to calculate the amounts but not percentages. This is what I tried:

1.

proc tabulate data=a missing;

class City Educational_Institution ;

var Dropped_Out;

table City* Educational_Institution, n*Dropped_Out Dropped_Out*sum    Dropped_Out*pctn<n>

run;

2.

proc tabulate data=a missing;

class City Educational_Institution ;

var Dropped_Out;

table City* Educational_Institution, n*Dropped_Out Dropped_Out*sum    Dropped_Out*pctn<Dropped_Out>

run;

 

The highlighted part does not work.  I also tried all kinds of options with PCTSUM. What am I missing?

I would appreciate help. Thanks

8 REPLIES 8
ballardw
Super User

Doesn't work is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the "</>" to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.

 

Not even going to try to diagnose PCTN or PCTSUM issues in Tabulate without an actual example data set.

 

Note that if you VAR variable that is numeric and has values of 1/0 then the statistic N is the number of non-missing values, SUM is the number of observations where the variable has the value of 1 and Mean is the percentage as a decimal of the 1s.

Sheril1
Calcite | Level 5
Thanks for the answer. I can only do it tomorrow
Sheril1
Calcite | Level 5

Hi, here is a code I tried and log:

 
 71         data a;
 72         INPUT id:4. city: $10. type_inst: $10. drooped;
 73         datalines;
 
 NOTE: The data set WORK.A has 20 observations and 4 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 94         ;
 95         run;
 96         /*1*/
 97          proc tabulate data= a      missing s=[just=c] format=comma7.1;
 98         class  city  type_inst    ;
 99         var drooped  ;
 100        table   city=''*type_inst='',  n*drooped drooped*sum  drooped*pctn<n> /row=float;
 101        run;
 
 ERROR:  n is invalid in the following denominator nesting : n.
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE TABULATE used (Total process time):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 101      !     /*ERROR:  n is invalid in the following denominator nesting : n*/
 102        /*2*/
 103        
 
 
 104         proc tabulate data= a      missing s=[just=c] format=comma7.1;
 105        class  city  type_inst    ;
 106        var drooped  ;
 107        table   city=''*type_inst='',  n*drooped drooped*sum  drooped*pctn<drooped> /row=float;
 108        run;
 
 NOTE: There were 20 observations read from the data set WORK.A.
 NOTE: PROCEDURE TABULATE used (Total process time):
       real time           0.04 seconds
       cpu time            0.04 seconds
       
 
 108      !     /*all 100%*/
 109        
 110        /*3*/
 111         proc tabulate data= a      missing s=[just=c] format=comma7.1;
 112        class  city  type_inst    ;
 113        var drooped  ;
 114        table   city=''*type_inst='',  n*drooped drooped*sum  drooped*pctsum<all> /row=float;
 115        run;
 
 WARNING: Invalid denominator nesting element: all.
 NOTE: There were 20 observations read from the data set WORK.A.
 NOTE: PROCEDURE TABULATE used (Total process time):
       real time           0.04 seconds
       cpu time            0.04 seconds
       
 
 115      !     /*all 100%*/
 116        

 

ballardw
Super User

 

Provide the data step reading the datalines as the CODE from the editor. The Log strips out the actual datalines so can't recreate your data.

ballardw
Super User

Here is an example of building an example data set with some random values.

One variable is coded 1/0 to demonstrate how to get nice single summaries.

 

data junk;
   do cat = 'XXX','YYY';
   do subcat='A','B','C';
   /* generate random number of values of each cat value*/
   do i=1 to (rand('integer',15,60));
      /* generate some random 1 and 0 values*/
      x= rand('bernoulli',0.3);
      output;
   end;
   end;
   end;
run;

proc tabulate data=junk;
   class cat subcat;
   var x;
   table cat all='Total',
         x *(n='Number nonmissing' sum='Number of 1'*f=best5.  mean='% of 1'*f=percent8.1)
   ;
      table cat *(subcat all='Cat Total') all='Overall Total',
         x *(n='Number nonmissing' sum='Number of 1'*f=best5.  mean='% of 1'*f=percent8.1)
   ;

run;

The 0.3 in the call to Rand('Bernoulli', 0.3) means that with enough values generated about 0.3 or 30 percent will have a value of 1, the remainder will have 0.

The Rand('integer') used that way returns value between 15 and 60 as integers with uniform distribution(equal probability of any of the values.

This is simple enough that you should be able to eyeball the N and Sum values and verify that the Mean (as percent) has the correct percent for the Subcat.

PaigeMiller
Diamond | Level 26

Percentages and counts are much easier to calculate via PROC FREQ, and after that you can print out the information in whatever format you want.

 

proc sort data=a;
    by educationa_institution city;
run;
proc freq data=a;
    by educationa_institution city;
    tables dropped_out/noprint out=want;
run;  
--
Paige Miller
AhmedAl_Attar
Ammonite | Level 13

Hi @Sheril1 

Have a look at this SESUG paper This Works on So Many Levels: Building Summary Tables in SAS® with Hash Programming

It may provide you with a way to get closer to your goal.

 

Hope this helps,

Ahmed

 

Astounding
PROC Star
You're missing the easy way. For a variable that is always 1 or 0, there's an easy way to calculate the percentage of times it is 1. Simply take the mean of the variable, something which tabulate can do easily.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 723 views
  • 1 like
  • 5 in conversation