BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fsharif
Fluorite | Level 6

Hello

 

I have data that looks something like this:

Screen Shot 2021-09-30 at 11.55.02 PM.png

 

 

This is just a sample table from a much larger data set. What I want to do is find a sum of tot_votes but for each variable, rep, lib, ind, isp. Then I want to add the result I get of tot_votes of REP and tot_votes of LIB and make this result into one variable, tot_votes of REP+LIB. (I don't know if I can get tot_votes of rep+lib in one step, I assume you have to do two steps).

 

so I would get something like this

IND 19

AS 20

REP/LIB 130

 

I've tried PROC SORT followed by if then statements to delete everything but REP and LIB and then add but wasn't working out for me and I wasn't sure how to execute it properly. I've also tried to use "RETAIN" and n+1 but I think I am using it incorrectly. I prefer not to use PROC SQL as I haven't been using it for the rest of this code.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@fsharif wrote:

Thanks so much for your reply. I am new to stats and sas and appreciate your kindness.

 

I am using data from a big set originally called NewMIT hence the NEWMIT set.

 

DATA PARTYSORT (keep=state county party total_votes);
	set NEWMIT;
run;

*this gives me close to what I want but also gives every single party sum for each state. I want to see rep and lib only eventually and those two added together;
 
proc freq data=partysort;
table state*party; weight total_votes;
run;

*tried this but it just gives total votes per state;
PROC SUMMARY data=partysort;
by state;
var $partyf;
output out=PROCSUM sum=;
run;

PROC PRINT data=procsum;
run;

*I've also tried a proc means by state but it doesn't list sums in the output;
PROC MEANS data=partysort;
class state;
var total_votes;
output out=sumofvotesstate 
sum (total_votes)=sum1;
run;

None of this is giving me exactly what I want, which is really just a way to summarize the fraction of votes going toward rep and lib.


Provide some example data in the form of a data step not a picture and you may get a more direct answer. Or use one of the SAS supplied data sets.

Your requirement is not very clear either but may be way simpler.

Here is another stab at what I think you might mean by "fraction of xx" that uses SASHELP.CLASS, a data set you should have in your install.

This is looking at Age, where I am more interested in the 13 and 14 combined age group compared with everything else. I want to know the "fraction" within each value of the SEX variable.

proc format;
value agegroup
  13,14 = "13 or 14"
  other = "everything else"
;
run;

proc freq data=sashelp.class;
   table sex * age;
   format age agegroup.;
run;

A similar format for you Party may be all you need. I am not sure which "fraction" you want but I suspect the Rowpct may be what you are looking for.

 

 

View solution in original post

11 REPLIES 11
andreas_lds
Jade | Level 19

@fsharif wrote:

Hello

 

I have data that looks something like this:

Screen Shot 2021-09-30 at 11.55.02 PM.png

 

 

This is just a sample table from a much larger data set. What I want to do is find a sum of tot_votes but for each variable, rep, lib, ind, isp. Then I want to add the result I get of tot_votes of REP and tot_votes of LIB and make this result into one variable, tot_votes of REP+LIB. (I don't know if I can get tot_votes of rep+lib in one step, I assume you have to do two steps).

 

so I would get something like this

IND 19

AS 20

REP/LIB 130

 

I've tried PROC SORT followed by if then statements to delete everything but REP and LIB and then add but wasn't working out for me and I wasn't sure how to execute it properly. I've also tried to use "RETAIN" and n+1 but I think I am using it incorrectly. I prefer not to use PROC SQL as I haven't been using it for the rest of this code.


You are mixing terms making it extra-difficult to understand want you have and want. The names in orange are not variables, but values. To better understand your problem, please post an excerpt of the data you have in usable form (data step using datalines statement) and show the expected result of that data .

If you want to sum by values, using proc summary/means is the way to go.

ballardw
Super User

From your example you do not have separate variables but one variable with different values that you want to group.

I think @PaigeMiller is on the right track but you want to use a MULTILABEL format to have the same values appear in separate and combined totals. Only a few procedures will use MULTILABEL formats and there can be a few tricks involved in the order of definition and which options are used in a specific procedure to control result appearance.

 

Below is some code I used in my organization to demonstrate different format creation options and how the options in Proc Tabulate interact to change order. This does make a small data set that involves random numbers so rerunning the data step can create different results.

Any of the styles of format creation for your Party variable would work, the difference is in the output order.

/* To demonstrate how the order of definition affects appearance in
   multilabel formats. Also appearance options to show the spaces to
   get the indent as desired and fix column widths.
   And investigate whether class level format based style overrides work
with MLF in proc tabulate.  Result: NO.
*/
proc format library=work;
value accidentl (multilabel notsorted)
1-5 = 'Accidents'
1-3 = ' Transport accidents'
1 = '   Motor vehicle accidents'
2 = '   Water, air, and space'
3 = '   Other land transport accidents'
4-5 = ' Nontransport accidents'
5 = '   Fishing'
;
value accidentr (multilabel notsorted)
1-5 = 'Accidents'
1-3 = ' Transport accidents'
4-5 = ' Nontransport accidents'
1 = '   Motor vehicle accidents'
2 = '   Water, air, and space'
3 = '   Other land transport accidents'
5 = '   Fishing'
;
value accidents (multilabel notsorted)
1 = '   Motor vehicle accidents'
2 = '   Water, air, and space'
3 = '   Other land transport accidents'
5 = '   Fishing'
1-5 = 'Accidents'
1-3 = ' Transport accidents'
4-5 = ' Nontransport accidents'
;
value mf
1 = "Male"
2 = "Female"
;
value mycolor
1='white'
2='red'
3='green'
4='blue'
5='orange'
6='purple'
;
value accsimple
1 = 'Motor vehicle accidents'
2 = 'Water, air, and space'
3 = 'Other land transport accidents'
4 = 'Nontransport accidents'
5 = 'Fishing'
;
value MyColorl (multilabel notsorted)
1-5 = 'white'
1-3 = 'red'
1 = 'blue'
2 = 'orange'
3 = 'pink'
4-5 = 'purple'
5 = 'black'
;
run;

/* populate a dataset to display */
/* This specifically does NOT generate any data for FISHING above*/
/* to display the behavior of the options below in those cases. */
data junk; 
do i=1 to 50;
type = round(4*ranuni(1234)+.5);
sex = round(2*ranuni(3455)+.5);
output;
end; 
run;


/* Notice that before we get here the data is NOT sorted */
/* in any manner!!!! */
title 'Option Order=Data Preloadfmt Printmiss Misstext=0 format accidentl';
proc tabulate data=junk order=data ;
   class type / mlf PRELOADFMT;
   /*ASIS preserves the leading spaces in the format, Cellwidth here is optional
     for this demo. These will ONLY affect ODS output such as HTML, RTF or PDF
     not the OUTPUT window
  */
   classlev type/ style=[asis=on cellwidth=1.5in];
   class sex;
   /* the formatted values of SEX normally take up different lengths, this fixes
      the column widths to be the same for both headers. May cause interesting
      appearances with large numbers of displayed digits if requested in formats*/
   classlev sex/ style=[cellwidth=.5in];
   /* to get the ALL to have the same width as SEX need to specify this way*/
   table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8.
   / printmiss misstext='0';
   format type accidentl. sex mf.;
run;title;

title 'Option simple format and classlev background color';
proc tabulate data=junk order=data ;
   class type / ;
   /*ASIS preserves the leading spaces in the format, Cellwidth here is optional
     for this demo. These will ONLY affect ODS output such as HTML, RTF or PDF
     not the OUTPUT window
  */
   classlev type/ style=[asis=on cellwidth=1.5in background=mycolor.];
   class sex;
   /* the formatted values of SEX normally take up different lengths, this fixes
      the column widths to be the same for both headers. May cause interesting
      appearances with large numbers of displayed digits if requested in formats*/
   classlev sex/ style=[cellwidth=.5in];
   /* to get the ALL to have the same width as SEX need to specify this way*/
   table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8.
   / row=float misstext='0';
   format type accsimple. sex mf.;
run;title;

title 'Option Order=Data Preloadfmt Printmiss Misstext=0 format accidentl classlev background';
proc tabulate data=junk order=data;
   class type / mlf PRELOADFMT;
   classlev type/ style=[asis=on cellwidth=1.5in background=mycolor.];
   class sex;
   classlev sex/ style=[cellwidth=.5in];
   table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8.
   / printmiss misstext='0';
   format type accidentl. sex mf.;
run;title;
title 'Option Order=Data Preloadfmt Printmiss Misstext=0 format accidents';
proc tabulate data=junk order=data;
   class type / mlf PRELOADFMT;
   classlev type/ style=[asis=on cellwidth=1.5in];
   class sex;
   classlev sex/ style=[cellwidth=.5in];
   table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8.
   / printmiss misstext='0';
   format type accidents. sex mf.;
run;title;
 

PaigeMiller
Diamond | Level 26

I'm going to have to disagree on this one @ballardw .

 

If you just want to combine two input categories into one output category (which is what I think the original question has asked to do), a non-multilabel format will work, as I provided in my code.

 

If you want a certain category in the input data to appear in multiple output categories in the format (as in your example), then a multilabel format is needed.

--
Paige Miller
ballardw
Super User

@PaigeMiller wrote:

I'm going to have to disagree on this one @ballardw .

 

If you just want to combine two input categories into one output category (which is what I think the original question has asked to do), a non-multilabel format will work, as I provided in my code.

 

If you want a certain category in the input data to appear in multiple output categories in the format (as in your example), then a multilabel format is needed.


I interpret the "Then I want to add the result I get of tot_votes of REP and tot_votes of LIB" as wanting an additional level as that follows " sum of tot_votes but for each variable, rep, lib, ind, isp.".(emphasis added assuming OP meant values of the variable Party)

PaigeMiller
Diamond | Level 26

So what the original question says, and the shown desired output, don't match.

 

By the way, it was my interpretation that the original poster thought it was necessary to compute the individual category totals as an intermediate step before getting the final desired output, but it is not necessary and I bypassed that step.

--
Paige Miller
fsharif
Fluorite | Level 6

Thanks so much for your reply. I am new to stats and sas and appreciate your kindness.

 

I am using data from a big set originally called NewMIT hence the NEWMIT set.

 

DATA PARTYSORT (keep=state county party total_votes);
	set NEWMIT;
run;

*this gives me close to what I want but also gives every single party sum for each state. I want to see rep and lib only eventually and those two added together;
 
proc freq data=partysort;
table state*party; weight total_votes;
run;

*tried this but it just gives total votes per state;
PROC SUMMARY data=partysort;
by state;
var $partyf;
output out=PROCSUM sum=;
run;

PROC PRINT data=procsum;
run;

*I've also tried a proc means by state but it doesn't list sums in the output;
PROC MEANS data=partysort;
class state;
var total_votes;
output out=sumofvotesstate 
sum (total_votes)=sum1;
run;

None of this is giving me exactly what I want, which is really just a way to summarize the fraction of votes going toward rep and lib.

ballardw
Super User

@fsharif wrote:

Thanks so much for your reply. I am new to stats and sas and appreciate your kindness.

 

I am using data from a big set originally called NewMIT hence the NEWMIT set.

 

DATA PARTYSORT (keep=state county party total_votes);
	set NEWMIT;
run;

*this gives me close to what I want but also gives every single party sum for each state. I want to see rep and lib only eventually and those two added together;
 
proc freq data=partysort;
table state*party; weight total_votes;
run;

*tried this but it just gives total votes per state;
PROC SUMMARY data=partysort;
by state;
var $partyf;
output out=PROCSUM sum=;
run;

PROC PRINT data=procsum;
run;

*I've also tried a proc means by state but it doesn't list sums in the output;
PROC MEANS data=partysort;
class state;
var total_votes;
output out=sumofvotesstate 
sum (total_votes)=sum1;
run;

None of this is giving me exactly what I want, which is really just a way to summarize the fraction of votes going toward rep and lib.


Provide some example data in the form of a data step not a picture and you may get a more direct answer. Or use one of the SAS supplied data sets.

Your requirement is not very clear either but may be way simpler.

Here is another stab at what I think you might mean by "fraction of xx" that uses SASHELP.CLASS, a data set you should have in your install.

This is looking at Age, where I am more interested in the 13 and 14 combined age group compared with everything else. I want to know the "fraction" within each value of the SEX variable.

proc format;
value agegroup
  13,14 = "13 or 14"
  other = "everything else"
;
run;

proc freq data=sashelp.class;
   table sex * age;
   format age agegroup.;
run;

A similar format for you Party may be all you need. I am not sure which "fraction" you want but I suspect the Rowpct may be what you are looking for.

 

 

PaigeMiller
Diamond | Level 26

Use a custom format, and PROC SUMMARY

 

UNTESTED CODE

 

proc format;
     value $partyf 'REP','LIB'='REP+LIB';
run;

proc summary nway data=have;
    class party;
    format party $partyf.;
    output out=sum_data sum=;
run;

 

From now on (please consider this mandatory) we need data as SAS data step code (instructions). Screen captures are not acceptable. Excel files or other file attachments are not acceptable. SAS data step code is the only acceptable way to provide data.

--
Paige Miller
fsharif
Fluorite | Level 6

Hi, thanks so much for your help and informing me of proper etiquette as I am still learning.. I tried to use the macro using the instructions you provided to paste share into here but ran into some issues so I am just going to paste what I have here.

 

I tried a few variations of what you you suggested and I was able to get total number of votes for each state. But what if I want so summarize total number of rep+lib votes for each state?

DATA PARTYSORT (keep=state county party total_votes);
	set NEWMIT;
run;

PROC SUMMARY nway data=PARTYSORT;
	class party;
	format $partyf.;
	output out=sum_data sum=;
run;

 I apologize if I'm being a bit difficult.

PaigeMiller
Diamond | Level 26

You didn't include my PROC FORMAT in your code, it won't work without it.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2643 views
  • 6 likes
  • 5 in conversation