Hello,
Please I need some help with data manipulation. I have clean up the data up to this point. Now, I want to create a frequency and percentages for each serv by table. The active and reserve is the count of boy in 2796 table who are active or reserve. i want to find the percentage of these count to create a table below
For example: I want the (%) for Boy in PHA(table) who are Active, (%) for Boy in 2796 who are active and so on. Just like my desired table below. I think I need numerator & denominator but am not sure. Am think to get the percentage for Boy in PHA who are active, my numerator is 41043/706476 (total PHA). I don't know how to code this in SAS. Any help is highly appreciated
ddata test;
length serv $5 table $5 ;
infile datalines dlm=',' dsd;
input serv $ table $ Active Reserve ;
datalines;
Boy,2796,41043,7024
Boy,PHA,274220,95291
Girl,2796,3200,1231
Girl,PHA,121634,36043
Dad,2796,9915,310
TOTAL,2796,77283,11371
TOTAL,PHA,706476,195988
;
run;
desired output
Serv | Table | Active |
| Reserve |
|
|
| N | % | N | % |
Boy | PHA | 274220 | ? | 95291 | ? |
2796 | 41043 | ? | 7024 | ? | |
| |||||
Girl | PHA | ||||
2796 | |||||
Dad | |||||
PHA | |||||
2796 | |||||
TOTAL | |||||
PHA | 706476 | 100 | 195988 | 100 | |
2796 | 77283 | 100 | 11371 | 100 |
data test;
length serv $5 table $5 ;
infile datalines dlm=',' dsd;
input serv $ table $ Active Reserve ;
datalines;
Boy,2796,41043,7024
Boy,PHA,274220,95291
Girl,2796,3200,1231
Girl,PHA,121634,36043
Dad,2796,9915,310
TOTAL,2796,77283,11371
TOTAL,PHA,706476,195988
;
proc sort data=test;
by table;
run;
data test2;
merge test(where=(serv='TOTAL') rename=(active=active_total reserve=reserve_total)) test;
by table;
active_pct=active/active_total;
reserve_pct=reserve/reserve_total;
run;
proc report data=test2;
columns serv table ("Active" active active_pct) ("Reserve" reserve reserve_pct);
define serv/group "Serv";
define table/group "Table";
define active/'N' format=comma10.0;
define active_pct/'%' format=percent8.2;
define reserve/'N' format=comma10.0;
define reserve_pct/'%' format=percent8.2;
run;
The values on the total line do not appear to be the total of the lines above it. So the percents don't add to 100%.
In addition, I again urge you (in fact, I insist) that you test your code by actually running it to make sure it works and gives the right answers. In this case, your code produces errors in the log. Please don't make us waste our time running code that doesn't work, or running code where the data is clearly incorrect. We're trying to help you, but you have to help us as well.
@PaigeMiller Thank you! Thank you! This works however I have one more question. I created a format for service because I want it in alphabetical like but I don't want my results to have 1Boy, 2Dad, 3Girl. How do I remove it or re-format so it does not have it in the final result. In the table below I add more variables so it looks like the original dataset and I wanted it in that exact format.
proc format; value $service 'Boy' = '1Boy' 'Dad'= '2Dad' 'Girl' = '3Girl' ;
Component | |||||||
Service | Table | Active | Reserve | Guard | |||
N | % | N | % | N | % | ||
Boy | PHA | ||||||
2796 | |||||||
Girl | PHA | ||||||
2796 | |||||||
Busy Z | PHA |
|
|
|
|
|
|
| 2796 |
|
|
|
|
|
|
Dad | PHA | ||||||
2796 |
People do add numbers in front of 'Boy','Dad', etc. and I think this is a very poor practice. The resulting table with '1Boy' looks, in my opinion, unprofessional. Fortunately, it is easy to work around this so that PROC REPORT puts things in whatever order you want. Example:
/* Create a numeric value for Boy, Dad, etc. that represents the desired sequence */
data test3;
set test2;
if service='Boy' then seq=1;
else if service='Dad' then seq=2;
else if service='Girl' then seq=3;
else if service='TOTAL' then seq=999;
run;
/* Create table with seq to determine the order */
/* Note: seq will not be displayed in the table because of NOPRINT */
proc report data=test3;
columns seq service table ("Active" active active_pct) ("Reserve" reserve reserve_pct);
define seq/noprint group order order=internal;
define service/group "Service";
define table/group "Table";
define active/'N' format=comma10.0;
define active_pct/'%' format=percent8.2;
define reserve/'N' format=comma10.0;
define reserve_pct/'%' format=percent8.2;
run;
Note: the creation of variable seq is done when creating data set test3, but it could have been placed in the code when creating data set test2 (simplifying the code somewhat).
@PaigeMiller Thank you. This is extremely helpful. I found many people doing it and I follow the steps. However, I tried your method on my data with multiple variables but am not getting it right. I created the sex1, age1 and race1 similar to your "seq" and added it to the proc report. I am suspecting it is because my data has some missingness when I coded it. Here is a snapshot and the warning I am getting. How can I fix this. I used the sashelp.heart to practice but it looks fine. Please can you explain how to use pro report to generate it?
data test1;
set test;
if sex='Male' then sex1=1;
else if sex='Female' then sex1=2;
if age='<20' then age1=1;
else if age='20-24' then age1=2;
else if age='25-29' then age1=3;
else if age='30-34' then age1=4;
else if age='35-39' then age1=5;
else if age='40-44' then age1=6;
else if age='45-49' then age1=7;
else if age='50+' then age1=8;
if Race='White, non-Hispanic' then race1=1;
else if Race ='Black, non-Hispanic' then race1=2;
else if Race = 'Hispanic' then race1=3;
else if Race = 'Other' then race1=4;
run;
Then I transpose it.
But when I use pro report, I got the warning and no output.
34 proc report data=tab_final ;
35 columns sex1 age1 race1 service1 varname Y2023_ percent23 ;
36 define sex1/noprint group order order=internal;
37 define age1/noprint group order order=internal;
38 define race1/noprint group order order=internal;
39 define service1/noprint group order order=internal;
40 define varname/group "Variables";
41 define Y2023_/'N' format=comma10.0;
42 define percent23/'%' format=percent8.2;
43 run;
WARNING: A GROUP, ORDER, or ACROSS variable is missing on every observation.
NOTE: There were 23 observations read from the data set QC.TAB1_FINAL.
NOTE: PROCEDURE REPORT used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
Why are you transposing? It doesn't seem necessary. Is it to get the columns in a report? Then it certainly is NOT necessary, and in fact it is the wrong thing to do. Show us the report you want — type it in using Excel and show us the screen capture.
Remember, no more transposing to get a report.
Why this:
if sex='Male' then sex1=1;
else if sex='Female' then sex1=2;
Why this:
if age='<20' then age1=1;
else if age='20-24' then age1=2;
else if age='25-29' then age1=3;
else if age='30-34' then age1=4;
else if age='35-39' then age1=5;
else if age='40-44' then age1=6;
else if age='45-49' then age1=7;
else if age='50+' then age1=8;
Is it to control the order in a report? There are other ways (that are much easier) if things are already in order. Your ages are in order. Your sexes are in order. The reason I proposed a different approach earlier was you wanted an order that was not alphabetic.
@PaigeMiller Yes. I want to get the variables in the order for the report. This is the report table.
Just to quickly ask this here @ballardw How do I make my 'Total' to be placed at the top just as it is in this report table with pro report. Thanks @PaigeMiller @ballardw
N | % | |
Total | 100 | |
Sex | ||
Male | ||
Female | ||
Age category | ||
<20 | ||
20-24 | ||
25-29 | ||
30-34 | ||
35-39 | ||
40-44 | ||
45-49 | ||
50+ | ||
Race/Ethnicity | ||
Non-Hispanic White | ||
Non-Hispanic Black | ||
Hispanic | ||
Other |
PROC REPORT is not the right tool for this type of output. Maybe PROC TABULATE will work, but I have almost no knowledge of PROC TABULATE. So maybe @ballardw can step in and show how (if possible) you can get this from PROC TABULATE. I suspect he will need to see a portion of your original (untransposed) data.
PROC REPORT produces rectangular tables where a variable (for example, YEAR) is going across the top and another variable (such as income catergories) goes down the first column. I'm guessing you could probably work very hard and make this table in PROC REPORT but I do not even want to think about working very hard to do this.
If somewhere earlier in your code for this project, you wrote code to create the age category '20-24' in a variable AGE1, and similarly for sex and race, this is also counterproductive and makes more work than the much better method, which is to assign a custom format to the original variable. Then in reporting procedures, SAS is smart enough to be able to know that the values associated with '<20' come before the values associated with '20-24' and then put them in the proper order in the desired table, without you having to force an ordering. The reason we started with a seq variable was because it seemed as if you wanted possibly values in a table that were not in alphabetical order or not in numeric order.
I'm going to insist that we stop here. I'm going to insist that in your first post in a thread, you describe the problem from the beginning all the way to the end, in words, without referring to SAS code at all. Then you need to show us (a portion of) the original data (not the data transformed or transposed after a couple of steps) and show us the desired output. This is the proper way to ask for help and will lead to the most efficient discussions, by providing these items I just described. This is the only way for us to make progress efficiently. You need to provide this for EVERY problem you want us to help with. No jumping into the middle and asking how to rename variable names; no jumping into the middle and wondering why your PROC REPORT has an error.
@CathyVI wrote:
@PaigeMiller Yes. I want to get the variables in the order for the report. This is the report table.
Just to quickly ask this here @ballardw How do I make my 'Total' to be placed at the top just as it is in this report table with pro report. Thanks @PaigeMiller @ballardw
N % Total 100 Sex Male Female Age category <20 20-24 25-29 30-34 35-39 40-44 45-49 50+ Race/Ethnicity Non-Hispanic White Non-Hispanic Black Hispanic Other
If your Total is computed from the values in the data then with Report that would be a RBREAK BEFORE;
If you are forcing a column variable that has the value of total and associated in the data set then I think you are looking either adding an order variable that appears first in your Columns statement and has the NOPRINT options in the define along with order or a custom format and the Preloadfmt option. But from what you've said your data isn't ready.
Your very limited picture of the data shows every single row with one or more missing values for Sex1, Age1, Race1 and Service1. So you are getting the warning because of all that missing data.
From the documentation of proc report in the Define section (emphasis added for any group so show why this is important.
Default If you omit the MISSING option, then PROC REPORT excludes from the report and the output data sets all observations that have a missing value for any group, order, or across variable.
The "fix" would be to add the MISSING option to the DEFINE statements for those group variables. However that will result in sections of the report created to display those missing values.
You don't show the actual Transpose code for use with any of the example sets we have created but it seems moderately clear that however you did it the result is not usable for Proc Report.
Any format based solution to order pretty much means that you have to provide something that handles all the values.
Since your example output does not include the "TOTAL" that was in your previous question and adds a value not there it's pretty hard to demonstrate.
You can use the PRELOADFMT option with ORDER=DATA on the Define. If you create the format with the NOTSORTED option then the values will display in the Format definition order:
This uses a different order to show that it forces to a different order than the data set. Since you did not include a specific place for TOTAL I exclude that from Proc Report with the Where since just demonstrating this format and option set. This uses the Test2 data set created in @PaigeMiller's example.
proc format; value $service (notsorted) 'Girl'='Girl' 'Boy' ='Boy' 'Dad' ='Dad' ; run; proc report data=test2; where serv ne 'TOTAL'; columns serv table ("Active" active active_pct) ("Reserve" reserve reserve_pct); define serv/group "Serv" format=$service. preloadfmt order=data; define table/group "Table"; define active/'N' format=comma10.0; define active_pct/'%' format=percent8.2; define reserve/'N' format=comma10.0; define reserve_pct/'%' format=percent8.2; run;
To place "Busy Z" in a specific order in your data you would add it to the format.
If you want a percentage you really do need to tell us exactly what the numerator and denominator is.
I have a suspicion that there may be more data "missing" in some way, even more than the Missing COMP in your other question because total up all of the Boy, Girl and Dad for either 2796 or PHA and the total of those is much less than the number reported for your "total".
I suspect that you may have manipulated your data so much getting to this part that lost something in context.
How do you answer this question: Is either a row, a column or all percentages or all the percentages within a value of Table supposed to sum to 100? If the answer is Yes, then tell us which one and explain where the not represented numerator values would be.
@PaigeMiller @ballardw Thank you for sharing your knowledge. This means alot to me as I continue to learn SAS. Two last question please.
1. How do I learn data manipulation? Is there any book/material you can recommend.
2. If you see my initial main table, active and reserve are under component. How can I create a proc report that will show that table exactly like that- where active and reserve will be under component like this below. I tried it by adding component to the group in proc report but i did not get the expected outcome.
UNTESTED
columns seq service table ("Component ("Active" active active_pct) ("Reserve" reserve reserve_pct));
How can you learn data manipulation? I don't have a complete set of instructions. The main thing is that you do lots of it. No one who is new at this is going to be able to manipulate data as well as @ballardw and I (well, at least as well as @ballardw ). Here are some other things that I see are important, based upon what I see among the younger data analysts where I work. Please note: THESE ARE MY OPINIONS.
proc sql;
create table one as select mean(myvariable) from dataset;
quit;
proc sql;
create table two as select mean(myvariable) from dataset where product='ABC';
quit;
proc sql;
create table three as select mean(myvariable) from dataset where product='DEF';
quit;
data stats;
set one two three;
run;
Example: (less inefficient)proc sql;
create table one as select mean(myvariable) from dataset;
quit;
proc sql;
create table two as select product,mean(myvariable) from dataset group by product;
quit;
data stats;
set one two;
run;
Example: (efficient)proc summary data=dataset;
class product;
var myvariable;
output out=stats mean=;
quit;
There's actually more code needed to make the above (inefficent) and (less inefficient) code work properly, I left it out because the point I am making should be clear. You get the same numbers as a result, but the inefficient requires 12 lines and the efficient requires 5. And the efficient code will execute a lot faster than the inefficient code. And if there are additional products, such as GHI and JKL and so on, you have to keep adding to your inefficient code in a repetitive fashion. The efficient code requires nothing else to handle ABC DEF GHI and JKL properly.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.