BookmarkSubscribeRSS Feed
Barkat
Pyrite | Level 9

How do I create a table as shown below (Table A) using proc tabulate or proc report. I would like to show only the numbers in "Went to School?=Yes" column in the "Completed High School?" column. That means only the "yes" to the :"Went to School?" column will be shown in "Completed High School?" column. I tried a lot but failed. 

 

(Dataset and codes are mentioned after Table B)

 

Table A

  gender went_to_sch completed_hi_sch
F M   no yes no yes
N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN
city 4 40 6 60 . . 2 20 8 80 2 25 6 75
A
B 4 40 6 60 2 20 2 20 6 60 1 17 5 83
C 6 60 4 40 2 20 . . 8 80 3 38 5 63

 

When I run my code (see below) it produces a table like Table B. 

 

Table B

  gender went_to_sch completed_hi_sch
F M   no yes   no yes
N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN N RowPctN
city 4 40 6 60 . . 2 20 8 80 2 20 2 20 6 60
A
B 4 40 6 60 2 20 2 20 6 60 4 40 1 10 5 50
C 6 60 4 40 2 20 . . 8 80 2 20 3 30 5 50

 

Dataset:

Proc sql;
create table have (id char, city char, gender char, went_to_sch char, completed_hi_sch char);
insert into have
values ("1", "A", "M", "yes", "yes")
values ("2", "B", "M", "no", " " )
values ("3", "C", "M", "yes", "no")
values ("4", "A", "F", "yes", "yes")
values ("5", "C", "F", "yes", "yes")
values ("6", "C", "F", " ", " ")
values ("7", "A", "M", "yes", "yes")
values ("8", "B", "M", "yes", "no")
values ("9", "B", "M", "yes", "yes")
values ("10", "A", "F", "no", " " )
values ("11", "B", "F", "yes", "yes")
values ("12", "C", "F", "yes", "yes")
values ("13", "A", "M", "yes", "no")
values ("14", "B", "F", "yes", "yes")
values ("15", "C", "M", " ", " ")
values ("16", "A", "M", "yes", "yes")
values ("17", "B", "M", "no", " ")
values ("18", "C", "M", "yes", "no")
values ("19", "A", "F", "yes", "yes")
values ("20", "C", "F", "yes", "yes")
values ("21", "B", "F", " ", " ")
values ("22", "A", "M", "yes", "yes")
values ("23", "C", "M", "yes", "no")
values ("24", "B", "M", "yes", "yes")
values ("25", "A", "F", "no", " " )
values ("26", "C", "F", "yes", "yes")
values ("27", "C", "F", "yes", "yes")
values ("28", "A", "M", "yes", "no")
values ("29", "B", "F", "yes", "yes")
values ("30", "B", "M", " ", " ");
select * from have;
quit;

 

 

SAS code:

proc tabulate data = have;
class City Gender Went_to_Sch Completed_Hi_Sch/ missing;
tables City, ( Gender Went_to_Sch Completed_Hi_Sch) *(N ROWPCTN);
run;

 

11 REPLIES 11
ballardw
Super User

Data is helpful.

 

Sometimes you have to do something with the data set for truly custom tables.

From you description the easiest might be to make a new data set with a NUMERIC variable that is assigned a value of only when "went to school" is Yes. Values would be 1 (for Yes) and 0 (for No) request the SUM statistic to get number of Yes and Mean for Percent yes using a percent format.

Since "no" would be the complement of yes then just don't show it.

Or use Proc report with the new variable

 

Barkat
Pyrite | Level 9
Good suggestion. Let's wait for other options if someone has any.
ballardw
Super User

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Example data we can code with is much more likely to get a targeted solution.

Cynthia_sas
SAS Super FREQ

Hi:

  It looks like you've tried PROC TABULATE already. It would be very useful to see ALL your code and have your data. Without seeing the structure of your data, it is nearly impossible to comment. Even if I made an attempt at an answer, the chances are good that any fake data I made up would not follow the same structure as your data and so any code I wrote might work with my fake data, but might not (probably would not) work with your data.

  And it's not clear to me what you want -- do you want RowPctN under the other category variables? I see 3 category variables in your table: Gender, Went to School and Completed High School. The RowPctN value will be off if you limit the table just to the YES columns, so I suspect what you're asking for is a similar table, but with the Percent for the Yes staying the same, but without seeing the NO or the Percent for the NO. On the other hand, that may not be what you want, since you showed us what you are currently getting that you don't want, but you didn't really show us what your desired result is in enough detail to write code. And without data it's hard to write code.

 

Cynthia

Barkat
Pyrite | Level 9
Thanks. I have updated my question.
Cynthia_sas
SAS Super FREQ
Thanks, but without data, no one can understand your data structure or your variables. No one can run your code without making some fake data. For the reason I explained previously, anyone who attempts to make data might get the structure incorrect and render their attempt meaningless or unusable.
Cynthia
Barkat
Pyrite | Level 9

Thanks, I understood your point and have edited my question by mentioning the dataset. Please let me know if that is helpful.

Cynthia_sas
SAS Super FREQ

Hi:

  Yes, it was helpful. I was able to run your code. This is what I got:

Cynthia_sas_0-1622816963413.png

My understanding of your question is that I think you're saying your want to see the columns highlighted in green, but NOT the columns highlighted in yellow? Is that correct? If you eliminate the columns highlighted in yellow, then that will change your ROWPCTN numbers. Is that what you want? If you want the percent numbers to stay the same and just NOT show the columns highlighted in yellow, you may have to pre-calculate the percents in one pass through the data and then create a dataset with the calculated percents based on the whole dataset. Then, you'd only do  report on the YES columns or the columns highlighted in green.

 

  You didn't really show your desired report or explain how the percents should or should not be impacted, so my understanding might be off from what you want.

Cynthia

Barkat
Pyrite | Level 9

I appreciate your help.
I tried to explain what I want. Maybe it's not clear enough. Sorry about that.

Let me try to explain again.
There are 10 persons from City A. 8 of them went to school.
Now see the difference in "went to hi sch" column between Table A and B.
Table A is showing the percentage of persons who completed hi sch among those who went to sch (I want to show this).
Table B is showing the percentage of persons who completed hi sch among all persons in the city.
Table A showing 25% (2 out of 8 ) did not complete hi sch.
Table B showing 20% (2 out of 10), did not complete hi sch.

So, my desired report is Table A.

Thanks so much!
Barkat

Cynthia_sas
SAS Super FREQ

Hi:

  What you want will be hard to do because of the values in the data you show. What I mean is this:

Cynthia_sas_0-1622836321809.png

You have 8 rows (highlighted in green) where Completed_Hi_Sch is missing. Then you have 4 rows (highlighted in cyan) where Went_to_Sch is missing. These 4 rows ALSO have Completed_Hi_Sch missing. So the problem is that you can't turn missing off for one of the variables without having an impact on the other variable.

 

  Your fake data makes 30 rows. If we remove the percent from the table for purposes of the example:

Cynthia_sas_1-1622839071682.png

As you can see, only on Report #2, do you see that all 30 rows are accounted for in the table because of the missing values. The ROWPCTN statistics wants to divide the number in each cell by the total for that row. Note how the row total changes based on whether you have missing or not. If you show the missing rows for Completed_Hi_Sch I don't see how you can avoid also showing the missing rows for Went_to_Sch because of the relationship in the data between the values on at least 4 of the rows.

Cynthia

Barkat
Pyrite | Level 9
Thanks for explaining nicely. I also tried with and w/o missing options and found what you showed.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1117 views
  • 1 like
  • 3 in conversation