BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I want to create a summary table that contains categories from different variables.

I am doing it via proc sql .

As you can see in the code it is quite long code.

I want to ask if there is a short way to do it via proc summary (or other proc)

 

thanks

Joe

   data rawdata;
   input  ID  Cat1 $  Cat2 $ Cat3 $  Y; 
   cards;
   1  a  Y AU 100
   2  a  N AU 110
   3  a  Y AU 120
   4  a  N UK 130
   5  b  Y UK 140
   6  b  N UK 150
   7  b  Y USA 160
   8  b  N USA 170
   9  c  Y USA 180
   10 c  N USA 190
   ;
   run;

PROC SQL;
	create table output1 as
	select 	Cat1 as cat,
            count(*)  as no_customers,
			sum(y) as Total_Y 
	from rawdata
	group by Cat1
;
QUIT;

PROC SQL;
	create table output2 as
	select 	Cat2 as cat,
            count(*)  as no_customers,
			sum(y) as Total_Y 
	from rawdata
	group by Cat2
;
QUIT;

PROC SQL;
	create table output3 as
	select 	Cat3 as cat,
            count(*)  as no_customers,
			sum(y) as Total_Y 
	from rawdata
	group by Cat3
;
QUIT;

Data output_all;
set output1 output2  output3;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

@RW9 wrote:

Please read:

"That will give you summary stats for each of the combinations, you can then where filter out what you dont want."

 

You can filter the created dataset to keep only the outputs you want.


You can use the features of MEANS/SUMMARY to get the combinations you want and not need to filter.  More efficient and allows for many more CLASS variables to be processed.  There are two statements for this TYPES and WAYS. For this scenario I think WAYS is most suitable.  You will need a data step to create CAT.  If you have character and numeric categorical variables you will need a couple of other options. 

 

proc summary data=rawdata missing;
   class cat:;
   ways 1;
   var y;
   output out=want n=n sum=sum;
   run;
data want;
   length cat $8;
   set want;
   cat = coalesceC(of cat1-cat3);
   run;
proc print;
   run;

Capture.PNG

 

View solution in original post

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

data rawdata;
   input id cat1 $ cat2 $ cat3 $ y; 
cards;
1  a  Y AU 100
2  a  N AU 110
3  a  Y AU 120
4  a  N UK 130
5  b  Y UK 140
6  b  N UK 150
7  b  Y USA 160
8  b  N USA 170
9  c  Y USA 180
10 c  N USA 190
;
run;

proc means data=rawdata;
  class cat:;
  var y;
  output out=want n=n sum=sum;
run;

That will give you summary stats for each of the combinations, you can then where filter out what you dont want.

Ronein
Meteorite | Level 14

It is nice but not what I need.

I need a table that includes  summary statistics per each category and not of combinations

In output I need to get 8 rows (as i get in the example i sent)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please read:

"That will give you summary stats for each of the combinations, you can then where filter out what you dont want."

 

You can filter the created dataset to keep only the outputs you want.

data_null__
Jade | Level 19

@RW9 wrote:

Please read:

"That will give you summary stats for each of the combinations, you can then where filter out what you dont want."

 

You can filter the created dataset to keep only the outputs you want.


You can use the features of MEANS/SUMMARY to get the combinations you want and not need to filter.  More efficient and allows for many more CLASS variables to be processed.  There are two statements for this TYPES and WAYS. For this scenario I think WAYS is most suitable.  You will need a data step to create CAT.  If you have character and numeric categorical variables you will need a couple of other options. 

 

proc summary data=rawdata missing;
   class cat:;
   ways 1;
   var y;
   output out=want n=n sum=sum;
   run;
data want;
   length cat $8;
   set want;
   cat = coalesceC(of cat1-cat3);
   run;
proc print;
   run;

Capture.PNG

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Didn't think of ways, nice, must remember that.

Ronein
Meteorite | Level 14

Perfect solution!

Is it possible to add to "want table"  another field that is telling us the var name that the category belongs to.

For example:

Cat "a" belong to  var Cat1  so in the new column it will written "Cat1"

Cat "b" belong to var Cat1  so in the new column it will written "Cat1"

Cat "Y" belong to var Cat2  so in the new column it will written "Cat2"

Cat "N" belong to var Cat2  so in the new column it will written "Cat2"

Cat "AU" belong to var Cat3  so in the new column it will written "Cat3"

Cat "UK" belong to var Cat3  so in the new column it will written "Cat3"

Ronein
Meteorite | Level 14

We can do it by proc format.

Thanks!!

data_null__
Jade | Level 19

@Ronein wrote:

Perfect solution!

Is it possible to add to "want table"  another field that is telling us the var name that the category belongs to.

For example:

Cat "a" belong to  var Cat1  so in the new column it will written "Cat1"

Cat "b" belong to var Cat1  so in the new column it will written "Cat1"

Cat "Y" belong to var Cat2  so in the new column it will written "Cat2"

Cat "N" belong to var Cat2  so in the new column it will written "Cat2"

Cat "AU" belong to var Cat3  so in the new column it will written "Cat3"

Cat "UK" belong to var Cat3  so in the new column it will written "Cat3"


This is how I do it.  The option DESCENDTYPES put the rows in the order of the CLASS list.

 

proc summary data=rawdata missing chartype descendtypes;
   class cat:;
   ways 1;
   var y;
   output out=want n=n sum=sum;
   run;
data want;
   length varnum 8 variable $32 cat $8;
   set want;
   array _c[*] cat1-cat3;
   varnum   = findc(_type_,'1');
   variable = vname(_c[varnum]);
   cat = coalesceC(of cat1-cat3);
   run;

Capture.PNG

 

data_null__
Jade | Level 19

If some of the CLASS variables are numeric you can use the MLF option to cause PROC SUMMARY to convert the numeric variables to character and populate the new variable with the FORMATTED value.  In the example I have changed CAT2 to numeric.  Notice that the formatted value of CAT2 uses the default format BEST12 and that it is RIGHT justified.  I usually just LEFT the value of all CLASS variables.  Notice that CAT2 in the output dataset is $12.

 

data rawdata;
   input id cat1 $ cat2 cat3 $ y; 
   cards;
1  a  1 AU 100
2  a  0 AU 110
3  a  1 AU 120
4  a  0 UK 130
5  b  1 UK 140
6  b  0 UK 150
7  b  1 USA 160
8  b  0 USA 170
9  c  1 USA 180
10 c  0 USA 190
;;;;
run;

proc summary data=rawdata missing chartype descendtypes;
   class cat: / mlf;
   ways 1;
   var y;
   output out=want n=n sum=sum;
   run;
proc contents varnum;
   run;
data want;
   length varnum 8 variable $32 rowlabel $16;
   set want;
   array _c[*] cat1-cat3;
   varnum   = findc(_type_,'1');
   variable = vname(_c[varnum]);
   rowlabel = _c[varnum];
   run;
proc print;
   run;
  Variables in Creation Order

 #    Variable    Type    Len

 1    cat1        Char      8
 2    cat2        Char     12
 3    cat3        Char      8
 4    _TYPE_      Char      3
 5    _FREQ_      Num       8
 6    n           Num       8
 7    sum         Num       8

 

Capture.PNG

novinosrin
Tourmaline | Level 20

Morning @data_null__  "There are two statements for this TYPES and WAYS. For this scenario I think WAYS is most suitable. "

 

May i request your comment/a line or two to  explain the distinction between the two. I just had a look at the documentation and not comprehending it well. I get the impression they both does the same. But, when you code I try to peek at it diligently but not grasping it. 

At your own convenience and when you can plz.

 

Thanks & Regards as always

 

data_null__
Jade | Level 19

@novinosrin wrote:

May i request your comment/a line or two to  explain the distinction between the two. I just had a look at the documentation and not comprehending it well. I get the impression they both does the same. But, when you code I try to peek at it diligently but not grasping it. 

 


For the example WAYS 1; would be the same as TYPES CAT1-CAT3.  Same as CLASS list but shorter to type.

 

proc summary data=rawdata missing chartype descendtypes;
   class cat: / mlf;
   types (cat:);
   *ways 1;
   var y;
   output out=want n=n sum=sum;
   run;

 

I rarely use WAYS.  I usually make two way tables with several categorical variables crossed with one other variable (treatment) to be displayed in columns with the other variables levels displayed in rows.  If I used WAYS 2; I would get pair-wise two way tables but I want only treatment crossed with each of other (analysis) categorical variables.  For that TYPES is the solution.

 

For example TYPES TRT*(AGE SEX); produces a table of all variables in parentheses' crossed with TRT.

 

data class;
   set sashelp.class;
   trt = rantbl(456789,.4);
   run;
proc print;
   run;
proc summary data=class chartype descendtypes missing;
   class age sex trt / mlf;
   types trt*(age sex);
   output out=counts;
   run;
proc print;
   run;

Capture.PNG

 

 

Maybe this will be helpful.  Let me know if you have other questions.  I usually do better trying to answer specific questions.  🙂

novinosrin
Tourmaline | Level 20

That's neat!

Thank you @data_null__ so much for your time. Much appreciate it.  

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4471 views
  • 5 likes
  • 4 in conversation