BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

Hello,

Please I need some help with data manipulation. I have clean up the data up to this point. Now, I want to create a frequency and percentages for each serv by table. The active and reserve is the count of boy in 2796 table who are active or reserve. i want to find the percentage of these count to create a table below

 For example: I want the (%) for Boy in PHA(table) who are Active, (%) for Boy in 2796 who are active and so on.  Just like my desired table below.  I think I need numerator & denominator but am not sure. Am think to get the percentage for Boy in PHA who are active, my numerator is 41043/706476 (total PHA). I don't know how to code this in SAS. Any help is highly appreciated

ddata test;
length serv $5 table $5 ;
infile datalines dlm=',' dsd;
input serv $ table $ Active Reserve ;
datalines;
Boy,2796,41043,7024
Boy,PHA,274220,95291
Girl,2796,3200,1231
Girl,PHA,121634,36043
Dad,2796,9915,310
TOTAL,2796,77283,11371
TOTAL,PHA,706476,195988
;
run;

desired output

Serv

Table

Active

 

Reserve

 

 

 

N

%

N

%

Boy

PHA

274220?95291?
 

2796

41043?7024?

 

     
Girl

PHA

    
 

2796

    

Dad

     
 

PHA

    
 

2796

    

TOTAL

     
 

PHA

706476100195988100
 

2796

7728310011371100
16 REPLIES 16
PaigeMiller
Diamond | Level 26

 

 

data test;
length serv $5 table $5 ;
infile datalines dlm=',' dsd;
input serv $ table $ Active Reserve ;
datalines;
Boy,2796,41043,7024
Boy,PHA,274220,95291
Girl,2796,3200,1231
Girl,PHA,121634,36043
Dad,2796,9915,310
TOTAL,2796,77283,11371
TOTAL,PHA,706476,195988
; 
proc sort data=test;
    by table;
run;
data test2;
    merge test(where=(serv='TOTAL') rename=(active=active_total reserve=reserve_total)) test;
    by table; 
    active_pct=active/active_total;
    reserve_pct=reserve/reserve_total;
run;

proc report data=test2;
    columns serv table ("Active" active active_pct) ("Reserve" reserve reserve_pct);
    define serv/group "Serv";
    define table/group "Table";
    define active/'N' format=comma10.0;
    define active_pct/'%' format=percent8.2;
    define reserve/'N' format=comma10.0;
    define reserve_pct/'%' format=percent8.2;
run;

 

 

The values on the total line do not appear to be the total of the lines above it. So the percents don't add to 100%.

 

In addition, I again urge you (in fact, I insist) that you test your code by actually running it to make sure it works and gives the right answers. In this case, your code produces errors in the log. Please don't make us waste our time running code that doesn't work, or running code where the data is clearly incorrect. We're trying to help you, but you have to help us as well.

--
Paige Miller
CathyVI
Pyrite | Level 9

@PaigeMiller  Thank you! Thank you! This works however I have one more question. I created a format for service because I want it in alphabetical like but I don't want my results to have 1Boy, 2Dad, 3Girl. How do I remove it or re-format so it does not have it in the final result.  In the table below I add more variables so it looks like the original dataset and I wanted it in that exact format.

proc format;
 value $service 
'Boy' = '1Boy'
'Dad'= '2Dad'
'Girl' = '3Girl'
;
  

Component

Service

Table

Active

Reserve

Guard

  

N

%

N

%

N

%

Boy

PHA

      
 

2796

      

Girl

PHA

      
 

2796

      

Busy Z

PHA

 

 

 

 

 

 

 

2796

 

 

 

 

 

 

Dad

PHA

      
 

2796

      
PaigeMiller
Diamond | Level 26

People do add numbers in front of 'Boy','Dad', etc. and I think this is a very poor practice. The resulting table with '1Boy' looks, in my opinion, unprofessional. Fortunately, it is easy to work around this so that PROC REPORT puts things in whatever order you want. Example:

 

/* Create a numeric value for Boy, Dad, etc. that represents the desired sequence */
data test3;
    set test2;
    if service='Boy' then seq=1;
    else if service='Dad' then seq=2;
    else if service='Girl' then seq=3;
    else if service='TOTAL' then seq=999;
run;

/* Create table with seq to determine the order */
/* Note: seq will not be displayed in the table because of NOPRINT */
proc report data=test3;
    columns seq service table ("Active" active active_pct) ("Reserve" reserve reserve_pct);
    define seq/noprint group order order=internal;
    define service/group "Service";
    define table/group "Table";
    define active/'N' format=comma10.0;
    define active_pct/'%' format=percent8.2;
    define reserve/'N' format=comma10.0;
    define reserve_pct/'%' format=percent8.2;
run;

 

 

Note: the creation of variable seq is done when creating data set test3, but it could have been placed in the code when creating data set test2 (simplifying the code somewhat).

--
Paige Miller
CathyVI
Pyrite | Level 9

@PaigeMiller  Thank you. This is extremely helpful. I found many people doing it and I follow the steps.  However, I tried your method on my data with multiple variables but am not getting it right.  I created the sex1, age1 and race1 similar to your "seq" and added it to the proc report. I am suspecting it is because my data has some missingness when I coded it. Here is a snapshot and the warning I am getting. How can I fix this. I used the sashelp.heart to practice but it looks fine.  Please can you explain how to use pro report to generate it?

 

data test1;
set test;
if sex='Male' then sex1=1;
else if sex='Female' then sex1=2;

if age='<20' then age1=1;
else if age='20-24' then age1=2;
else if age='25-29' then age1=3;
else if age='30-34' then age1=4;
else if age='35-39' then age1=5;
else if age='40-44' then age1=6;
else if age='45-49' then age1=7; 
else if age='50+'  then  age1=8;

if Race='White, non-Hispanic' then race1=1;
else if Race ='Black, non-Hispanic' then race1=2;
else if Race = 'Hispanic' then race1=3;
else if Race = 'Other' then race1=4;
run;

Then I transpose it. 

 

CathyVI_0-1713364057641.png

But when I use pro report, I got the warning and no output.

34         proc report data=tab_final ;
35             columns  sex1 age1 race1 service1 varname Y2023_ percent23  ;
36         	define sex1/noprint group order order=internal;
37         	define age1/noprint group order order=internal;
38         	define race1/noprint group order order=internal;
39         	define service1/noprint group order order=internal;
40             define varname/group "Variables";
41             define Y2023_/'N' format=comma10.0;
42             define percent23/'%' format=percent8.2;
43         run;

WARNING: A GROUP, ORDER, or ACROSS variable is missing on every observation.
NOTE: There were 23 observations read from the data set QC.TAB1_FINAL.
NOTE: PROCEDURE REPORT used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds

 

 

PaigeMiller
Diamond | Level 26

Why are you transposing? It doesn't seem necessary. Is it to get the columns in a report? Then it certainly is NOT necessary, and in fact it is the wrong thing to do. Show us the report you want — type it in using Excel and show us the screen capture.

 

Remember, no more transposing to get a report.

 

Why this:

 

if sex='Male' then sex1=1;
else if sex='Female' then sex1=2;

 

Why this:

 

if age='<20' then age1=1;
else if age='20-24' then age1=2;
else if age='25-29' then age1=3;
else if age='30-34' then age1=4;
else if age='35-39' then age1=5;
else if age='40-44' then age1=6;
else if age='45-49' then age1=7; 
else if age='50+'  then  age1=8;

 

 

Is it to control the order in a report? There are other ways (that are much easier) if things are already in order. Your ages are in order. Your sexes are in order. The reason I proposed a different approach earlier was you wanted an order that was not alphabetic.

--
Paige Miller
CathyVI
Pyrite | Level 9

@PaigeMiller  Yes. I want to get the variables in the order for the report. This is the report table.  

 

Just to quickly ask this here @ballardw How do I make my 'Total' to be placed at the top just as it is in this report table with pro report.  Thanks @PaigeMiller @ballardw 

 N%
Total 100
Sex  
Male  
Female  
Age category  
<20  
20-24  
25-29  
30-34  
35-39  
40-44  
45-49  
50+  
Race/Ethnicity  
Non-Hispanic White  
Non-Hispanic Black  
Hispanic  
Other  
PaigeMiller
Diamond | Level 26

Maxim 14 — Use the right tool

 

PROC REPORT is not the right tool for this type of output. Maybe PROC TABULATE will work, but I have almost no knowledge of PROC TABULATE. So maybe @ballardw can step in and show how (if possible) you can get this from PROC TABULATE. I suspect he will need to see a portion of your original (untransposed) data.

 

PROC REPORT produces rectangular tables where a variable (for example, YEAR) is going across the top and another variable (such as income catergories) goes down the first column. I'm guessing you could probably work very hard and make this table in PROC REPORT but I do not even want to think about working very hard to do this.

 

If somewhere earlier in your code for this project, you wrote code to create the age category '20-24' in a variable AGE1, and similarly for sex and race, this is also counterproductive and makes more work than the much better method, which is to assign a custom format to the original variable. Then in reporting procedures, SAS is smart enough to be able to know that the values associated with '<20' come before the values associated with '20-24' and then put them in the proper order in the desired table, without you having to force an ordering. The reason we started with a seq variable was because it seemed as if you wanted possibly values in a table that were not in alphabetical order or not in numeric order.

--
Paige Miller
PaigeMiller
Diamond | Level 26

I'm going to insist that we stop here. I'm going to insist that in your first post in a thread, you describe the problem from the beginning all the way to the end, in words, without referring to SAS code at all. Then you need to show us (a portion of) the original data (not the data transformed or transposed after a couple of steps) and show us the desired output. This is the proper way to ask for help and will lead to the most efficient discussions, by providing these items I just described. This is the only way for us to make progress efficiently. You need to provide this for EVERY problem you want us to help with. No jumping into the middle and asking how to rename variable names; no jumping into the middle and wondering why your PROC REPORT has an error.

--
Paige Miller
ballardw
Super User

@CathyVI wrote:

@PaigeMiller  Yes. I want to get the variables in the order for the report. This is the report table.  

 

Just to quickly ask this here @ballardw How do I make my 'Total' to be placed at the top just as it is in this report table with pro report.  Thanks @PaigeMiller @ballardw 

  N %
Total   100
Sex    
Male    
Female    
Age category    
<20    
20-24    
25-29    
30-34    
35-39    
40-44    
45-49    
50+    
Race/Ethnicity    
Non-Hispanic White    
Non-Hispanic Black    
Hispanic    
Other    

If your Total is computed from the values in the data then with Report that would be a RBREAK BEFORE;

If you are forcing a column variable that has the value of total and associated in the data set then I think you are looking either adding an order variable that appears first in your Columns statement and has the NOPRINT options in the define along with order or a custom format and the Preloadfmt option. But from what you've said your data isn't ready.

ballardw
Super User

Your very limited picture of the data shows every single row with one or more missing values for Sex1, Age1, Race1 and Service1. So you are getting the warning because of all that missing data.

 

From the documentation of proc report in the Define section (emphasis added for any group so show why this is important.

 

Default If you omit the MISSING option, then PROC REPORT excludes from the report and the output data sets all observations that have a missing value for any group, order, or across variable.

The "fix" would be to add the MISSING option to the DEFINE statements for those group variables. However that will result in sections of the report created to display those missing values.

You don't show the actual Transpose code for use with any of the example sets we have created but it seems moderately clear that however you did it the result is not usable for Proc Report.

 

 

ballardw
Super User

Any format based solution to order pretty much means that you have to provide something that handles all the values.

Since your example output does not include the "TOTAL" that was in your previous question and adds a value not there it's pretty hard to demonstrate.

 

You can use the PRELOADFMT option with ORDER=DATA on the Define. If you create the format with the NOTSORTED option then the values will display in the Format definition order:

This uses a different order to show that it forces to a different order than the data set. Since you did not include a specific place for TOTAL I exclude that from Proc Report with the Where since just demonstrating this format and option set. This uses the Test2 data set created in @PaigeMiller's example.

proc format;
value $service (notsorted)
'Girl'='Girl'
'Boy' ='Boy'
'Dad' ='Dad'
;
run;
proc report data=test2;
    where serv ne 'TOTAL';
    columns serv table ("Active" active active_pct) ("Reserve" reserve reserve_pct);
    define serv/group "Serv" format=$service. preloadfmt order=data;
    define table/group "Table";
    define active/'N' format=comma10.0;
    define active_pct/'%' format=percent8.2;
    define reserve/'N' format=comma10.0;
    define reserve_pct/'%' format=percent8.2;
run;

To place "Busy Z" in a specific order in your data you would add it to the format.

 

 

 

ballardw
Super User

If you want a percentage you really do need to tell us exactly what the numerator and denominator is.

I have a suspicion that there may be more data "missing" in some way, even more than the Missing COMP in your other question because total up all of the Boy, Girl and Dad for either 2796 or PHA and the total of those is much less than the number reported for your "total".

 

I suspect that you may have manipulated your data so much getting to this part that lost something in context.

 

How do you answer this question: Is either a row, a column or all percentages or all the percentages within a value of Table supposed to sum to 100? If the answer is Yes, then tell us which one and explain where the not represented numerator values would be.

CathyVI
Pyrite | Level 9

@PaigeMiller @ballardw  Thank you for sharing your knowledge. This means alot to me as I continue to learn SAS. Two last question please.

1. How do I learn data manipulation? Is there any book/material you can recommend.

2. If you see my initial main table, active and reserve are under component.  How can I create a proc report that will show that table exactly like that- where active and reserve will be under component like this below. I tried it by adding component to the group in proc report but i did not get the expected outcome.

CathyVI_0-1713213505137.png

 

PaigeMiller
Diamond | Level 26

UNTESTED

 

columns seq service table ("Component ("Active" active active_pct) ("Reserve" reserve reserve_pct));

 

How can you learn data manipulation? I don't have a complete set of instructions. The main thing is that you do lots of it. No one who is new at this is going to be able to manipulate data as well as @ballardw and I (well, at least as well as @ballardw ). Here are some other things that I see are important, based upon what I see among the younger data analysts where I work. Please note: THESE ARE MY OPINIONS.

 

  • Learn to get really really good at DATA steps, PROC SUMMARY and either PROC REPORT or PROC TABULATE. These are your main calculation engines and display engines. This doesn't mean that you can't use other PROCs, but those should be secondary to the above, and use them when the above can't get you where you are going.
  • Do not use PROC SQL as a data analysis engine. The above PROCs are simply much more powerful. Just because you can do a lot of data manipulation in PROC SQL, that doesn't mean you SHOULD do a lot of data manipulation in SQL. SQL is really good at extracting from databases, and joining tables, and similar operations, that is when you should use SQL. It also can do some things easily that DATA steps can't do as easily. However, SQL is inefficient for doing lots of calculations and creating reports and will require a lot of code; and it will run more slowly. Where I am, I see the younger people write code which has SQL after SQL for dozens/hundreds of lines; the same analysis in PROC SUMMARY would run faster and take less code. Example that I saw today (inefficient):
    proc sql;
        create table one as select mean(myvariable) from dataset;
    quit;
    proc sql;
        create table two as select mean(myvariable) from dataset where product='ABC';
    quit;
    proc sql;
        create table three as select mean(myvariable) from dataset where product='DEF';
    quit;
    
    data stats;
        set one two three;
    run;
    Example: (less inefficient)
    proc sql;
        create table one as select mean(myvariable) from dataset;
    quit;
    proc sql;
        create table two as select product,mean(myvariable) from dataset group by product;
    quit;
    data stats;
        set one two;
    run;
    Example: (efficient)
    proc summary data=dataset;
        class product;
        var myvariable;
        output out=stats mean=;
    quit;
    There's actually more code needed to make the above (inefficent) and (less inefficient) code work properly, I left it out because the point I am making should be clear. You get the same numbers as a result, but the inefficient requires 12 lines and the efficient requires 5. And the efficient code will execute a lot faster than the inefficient code. And if there are additional products, such as GHI and JKL and so on, you have to keep adding to your inefficient code in a repetitive fashion. The efficient code requires nothing else to handle ABC DEF GHI and JKL properly.

    The "less inefficient" has less typing, but still not as little as the efficient code. And it will run slower than the efficient code.

    If you want to do this for two or more different GROUP BY variables, then SUMMARY blows SQL out of the water.

 

 

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 676 views
  • 4 likes
  • 3 in conversation