BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
urban58
Quartz | Level 8

Hello SAS community,

 

I have a large dataset with lots of variables - see a sample below

 

Data have;
input id brain heart liver stomack llung lkidney lovary rlung rkidney rovary ucolon lcolon;
cards
;
1 1 0 1 0 1 0 0 0 1 0 1 1
2 0 0 0 1 0 1 0 0 0 1 0 1
3 0 1 0 0 1 0 1 1 1 0 0 0
4 0 0 0 0 0 1 0 0 0 0 0 1
5 1 1 1 1 1 1 1 1 1 1 1 1
6 0 0 1 0 0 0 0 0 1 0 1 1
7 0 0 1 1 0 0 0 0 0 0 0 0
8 1 1 1 1 0 0 0 1 1 1 0 1
9 0 1 0 1 0 0 0 0 0 0 0 0
10 0 0 0 0 1 1 1 1 0 1 1 0
;
run;

 

desired output
Body part   % with XXX=1
Brain                
Heart               
Liver
Stomack
Lung
Kidney
Ovary
Colon

 

Appreciate any help!

Maggie

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Do you expect the totals for variables like Lkidney and Rkidney as a single value?

You really aren't clear whether you need a data set or report. I am providing a report done one of the many possible:

 

Data have;
input id brain heart liver stomack llung lkidney lovary rlung rkidney rovary ucolon lcolon;
cards
;
1 1 0 1 0 1 0 0 0 1 0 1 1
2 0 0 0 1 0 1 0 0 0 1 0 1
3 0 1 0 0 1 0 1 1 1 0 0 0
4 0 0 0 0 0 1 0 0 0 0 0 1
5 1 1 1 1 1 1 1 1 1 1 1 1
6 0 0 1 0 0 0 0 0 1 0 1 1
7 0 0 1 1 0 0 0 0 0 0 0 0
8 1 1 1 1 0 0 0 1 1 1 0 1
9 0 1 0 1 0 0 0 0 0 0 0 0
10 0 0 0 0 1 1 1 1 0 1 1 0
;
run;

/* assumes data have is sorted by ID
AND that there are no duplicate values of ID
*/
proc transpose data=have out=trans;
   by id;
run;

proc format;
value $bodyparts
'llung','rlung'    ='Lung'
'lkidney','rkidney'='Kidney'
'lovary','rovary'  ='Ovary'
'brain'            ='Brain'             
'heart'            ='Heart'            
'liver'            ='Liver'
'stomack'          ='Stomack'
'ucolon','lcolon'  ='Colon'
;
run;

Proc tabulate data=trans;
   class _name_;
   format _name_ $bodyparts.;
   var col1;
   table _name_=' ',
         col1=' '*(Sum='Count'*f=best6. mean='%'*f=percent8.1)
         ;
run;

Caution: The format assumes the current names of the variables are exactly as you have provided them here (stomach is the more typical English spelling if I recall correctly) and including the case. The defaults of the Proc Transpose create a variable named "_name_" containing the value of the name. You may need to double check spelling, upper and lower case included, to make the format text on the left side of each = to match your data. The default will also return the numeric value in Col1. These names can be changed but for an example like this should suffice.

 

Formats assigning common values to multiples or groups of values create groups that will be honored by most of the SAS report, analysis and graphing procedures. So I do not change any actual values of the _name_ but use the custom format to create the groups (and provide capitalization of the result).

When you have variables that take a value of 1/0 then the SUM statistic gives you the count of 1's and the Mean is a decimal percentage OF the observations with values. Missing values will be excluded for both the Sum and Mean calculations. So if you have 10 observations and one of the bodypart variable value is missing then that part will only have 9 values. If you expect a percentage based on 10 values then you have more work involved defining what the numerator and denominator for the rate should be.

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

Do you want this as a report or a SAS data set?

urban58
Quartz | Level 8

a report would be great

PaigeMiller
Diamond | Level 26
proc summary data=have;
    var brain -- lcolon;
    output out=want mean=;
run;

You can transpose it if you want the information displayed vertically.

--
Paige Miller
urban58
Quartz | Level 8

Thanks PaigeMiller for your reply.

The data has variables brain heart liver stomack llung lkidney lovary rlung rkidney rovary ucolon lcolon

llung, rlung need to be put as lung, same for lkidney, rkidney, etc.

 

How do do that before calculating the frequencies?

 

Thanks,

Maggie

PaigeMiller
Diamond | Level 26

@urban58 wrote:

 

The data has variables brain heart liver stomack llung lkidney lovary rlung rkidney rovary ucolon lcolon

llung, rlung need to be put as lung, same for lkidney, rkidney, etc.


This detail was not mentioned originally. Could you please re-write the problem to include this information and all other relevant information (including other relevant points you have not yet mentioned), so the entire problem is in one problem statement? Please include in your explanation what happens if llung and rlung are both 1, is the frequency = 1 or is the frequency = 2?

--
Paige Miller
ballardw
Super User

Do you expect the totals for variables like Lkidney and Rkidney as a single value?

You really aren't clear whether you need a data set or report. I am providing a report done one of the many possible:

 

Data have;
input id brain heart liver stomack llung lkidney lovary rlung rkidney rovary ucolon lcolon;
cards
;
1 1 0 1 0 1 0 0 0 1 0 1 1
2 0 0 0 1 0 1 0 0 0 1 0 1
3 0 1 0 0 1 0 1 1 1 0 0 0
4 0 0 0 0 0 1 0 0 0 0 0 1
5 1 1 1 1 1 1 1 1 1 1 1 1
6 0 0 1 0 0 0 0 0 1 0 1 1
7 0 0 1 1 0 0 0 0 0 0 0 0
8 1 1 1 1 0 0 0 1 1 1 0 1
9 0 1 0 1 0 0 0 0 0 0 0 0
10 0 0 0 0 1 1 1 1 0 1 1 0
;
run;

/* assumes data have is sorted by ID
AND that there are no duplicate values of ID
*/
proc transpose data=have out=trans;
   by id;
run;

proc format;
value $bodyparts
'llung','rlung'    ='Lung'
'lkidney','rkidney'='Kidney'
'lovary','rovary'  ='Ovary'
'brain'            ='Brain'             
'heart'            ='Heart'            
'liver'            ='Liver'
'stomack'          ='Stomack'
'ucolon','lcolon'  ='Colon'
;
run;

Proc tabulate data=trans;
   class _name_;
   format _name_ $bodyparts.;
   var col1;
   table _name_=' ',
         col1=' '*(Sum='Count'*f=best6. mean='%'*f=percent8.1)
         ;
run;

Caution: The format assumes the current names of the variables are exactly as you have provided them here (stomach is the more typical English spelling if I recall correctly) and including the case. The defaults of the Proc Transpose create a variable named "_name_" containing the value of the name. You may need to double check spelling, upper and lower case included, to make the format text on the left side of each = to match your data. The default will also return the numeric value in Col1. These names can be changed but for an example like this should suffice.

 

Formats assigning common values to multiples or groups of values create groups that will be honored by most of the SAS report, analysis and graphing procedures. So I do not change any actual values of the _name_ but use the custom format to create the groups (and provide capitalization of the result).

When you have variables that take a value of 1/0 then the SUM statistic gives you the count of 1's and the Mean is a decimal percentage OF the observations with values. Missing values will be excluded for both the Sum and Mean calculations. So if you have 10 observations and one of the bodypart variable value is missing then that part will only have 9 values. If you expect a percentage based on 10 values then you have more work involved defining what the numerator and denominator for the rate should be.

urban58
Quartz | Level 8

Thank you ballardw, I will use this code on my data and ask if I don't understand something or accept as the solution your response.

Maggie

urban58
Quartz | Level 8

your code is just great ballardw, it answered my question exactly - sorry for the late response. Is there a way I can display the %s so the largest is 1st, etc.

Maggie

ballardw
Super User

@urban58 wrote:

your code is just great ballardw, it answered my question exactly - sorry for the late response. Is there a way I can display the %s so the largest is 1st, etc.

Maggie


One way is to calculate the numbers needed and sort.

proc summary data=trans nway;
   class _name_;
   format _name_ $bodyparts.;
   var col1;
   output out=summary sum= mean= /autoname;
run;
proc sort data=summary;
   by descending col1_mean;
run;

proc print data=summary noobs label;
   var _name_ col1_sum col1_mean;
   format col1_mean percent8.1;
   label
      col1_sum='Count'
      col1_mean='%'
   ;
run;

Caution: this summary data set only has one value of each of the two part elements and you can't always be sure which one it should be.

Proc summary (or Means) will only keep one of the values of the Class variable(s). The format still applies though. The autoname options on the requested statistics appends the _ and statistic to the variable (assumes the result is less than 32 characters, longer variables will get truncated to fit 32 characters).

Note that by default you will get two additional variables in the result of Proc Summary: _type_ , which indicates which combination(s) of class variables are present, the NWAY option means only the largest type is in the result, and _freq_ which is a raw count of the number of values used in the summary for each row of output.

 

Proc Tabulate does not have a "sort order by statistic" option so a different approach is needed.

urban58
Quartz | Level 8
Thanks for all your help with the code and the caution explanation - much appreciated!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 955 views
  • 2 likes
  • 4 in conversation