BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mark4
Obsidian | Level 7

The general question is if it is possible to create a function that takes a list, applies a function to each item in that list, and then returns the results in a usable type (list, dataset, etc).

 

In my specific case, I have a dataset with several categorical variables, and I want to return a dataset that contains various summary statistics (mean, count, etc) as columns and rows are determined by each categorical variable.  I can do with SQL by creating a dataset of a single row for each categorical variable and then iteratively combining these, but it is rather ugly and awkward.  Is there a better way to do this?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

For categorical variables you would use PROC FREQ to get counts

For numerical/continous variables you would use PROC MEANS. You've received serveral results for PROC MEANS, so I'll show you the PROC FREQ for categorical data.

 

 

And here's a way to list all the possible levels and counts:

https://gist.github.com/statgeek/e0903d269d4a71316a4e

 

Here's a quick way to do a table summary, with a single cross variable:

https://gist.github.com/statgeek/0c4aeec9053cf8050be18a03b842c1b9

 

Another good PROC to keep in your back pocket for summary tables is PROC TABULATE. It's pretty powerful for creating various summaries and probably under utilized.

 


@mark4 wrote:

The general question is if it is possible to create a function that takes a list, applies a function to each item in that list, and then returns the results in a usable type (list, dataset, etc).

 

In my specific case, I have a dataset with several categorical variables, and I want to return a dataset that contains various summary statistics (mean, count, etc) as columns and rows are determined by each categorical variable.  I can do with SQL by creating a dataset of a single row for each categorical variable and then iteratively combining these, but it is rather ugly and awkward.  Is there a better way to do this?


 

 

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please provide some example, with test data in the form of a datastep, and what the output should look like.

This: "The general question is if it is possible to create a function that takes a list, applies a function to each item in that list, and then returns the results in a usable type (list, dataset, etc)." - is what a datastep is.

Dataset in with list

Datastep runs function against that

Data out 

mark4
Obsidian | Level 7

I don't see how to reconcile your description of a data step with my problem.  The data step, as far as I know, takes a list of datasets as input and produces a dataset, but that is not what I'm after here:  My list is a set of categorical variables from a dataset and the function takes a variable and produces a dataset.  So,

"Data Step": (dataset_1, ..., dataset_n) -> dataset,

but I want

"Wanted_Function": (variable, dataset) -> dataset

where variable is a column of the input dataset.

 

Since I have a working solution, I'm really interested in knowing if there are built-in functions or other solutions I might be missing that can clean up my code and the amount of datasets produced.

 

For example, if my dataset my_data looked like:

 

Sex    Region   Salary

M       A             10000

M       B             15000

F        A             11000

F        A             14000

 

I would want a dataset want_data that looked like

 

Category   Count   Avg_Salary

M               2          12500  

F               2          12500  

A               3          12500  

B               1          11667  

 

I can create this in SQL, but as I said, it's ugly - it builds a lot of intermediate datasets (here, just two - group_by_Sex, group_by_Region) that I don't want to keep and the union statement gets messy when you start incorporating more and more categorical variables.

 

%macro summary_table(categorical_variable);

  create table group_by_&categorical_variable as select

     &categorical_variable as category, count() as count, avg(Salarly) as Avg_Salary

  from my_data group by  &categorical_varialble;

%mend;

proc sql noprint;

%summary_table(Sex);

%summary_table(Region);

create table want_data as (select * from group_by_Sex) union (select * from group_by_Region);

quit;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26
data have;
 input Sex $ Region $ Salary;
datalines;
M       A             10000
M       B             15000
F        A             11000
F        A             14000
;
run;

proc summary data=have;
  class sex region;
  var salary;
  output out=want sum=sum;
run;

That gives the sums for sex/region and combinations.  I can't find the post from data_null, but he had presented two options to reduce the number of the output, let me see if I can find it.

 

Ah, here we go:

https://communities.sas.com/t5/SAS-Programming/Summary-table-with-categories-from-multiple-vars/m-p/...

ballardw
Super User

@mark4 wrote:

I don't see how to reconcile your description of a data step with my problem. 


The request for a data step that demonstrates the structure of the data set and names and types of variables is so we can provide examples that match you situation closer. A data step allows us to create a data set that code can be tested with.

It often saves a great many posts when a suggested solution is provided but then the original poster comes back and "says that didn't work", followed by questions determining why not. Then finding out that one of the variables was character and such concepts as "mean" and "standard deviation" don't actually apply. Also, a fair number of questions here data that is poorly structured for a specific task. With an example we can show how to restructure the data as well as get the solution steps.

mark4
Obsidian | Level 7
No, I get that. My intent wasn't so much a solution as general principals to get to a solution so I was trying to be as general as possible. Having the various options discussed below is very helpful for me since a lot of SAS requires you to know which procedures are even out there for you to use.
data_null__
Jade | Level 19

Maybe something like this.  I like the STACKODS output from PROC MEANS for summary stats. 

 

data class;
   set sashelp.class;
   trt = rantbl(456789,.4);
   run;
proc print;
   run;
ods select none;
ods output summary=summary;
proc means data=class chartype descendtypes stackods missing n mean stddev median min max;
   class age sex trt / mlf;
   types trt*(age sex);
   run;
ods select all;
proc print;
   run;

Capture.PNG

 

Reeza
Super User

For categorical variables you would use PROC FREQ to get counts

For numerical/continous variables you would use PROC MEANS. You've received serveral results for PROC MEANS, so I'll show you the PROC FREQ for categorical data.

 

 

And here's a way to list all the possible levels and counts:

https://gist.github.com/statgeek/e0903d269d4a71316a4e

 

Here's a quick way to do a table summary, with a single cross variable:

https://gist.github.com/statgeek/0c4aeec9053cf8050be18a03b842c1b9

 

Another good PROC to keep in your back pocket for summary tables is PROC TABULATE. It's pretty powerful for creating various summaries and probably under utilized.

 


@mark4 wrote:

The general question is if it is possible to create a function that takes a list, applies a function to each item in that list, and then returns the results in a usable type (list, dataset, etc).

 

In my specific case, I have a dataset with several categorical variables, and I want to return a dataset that contains various summary statistics (mean, count, etc) as columns and rows are determined by each categorical variable.  I can do with SQL by creating a dataset of a single row for each categorical variable and then iteratively combining these, but it is rather ugly and awkward.  Is there a better way to do this?


 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1973 views
  • 3 likes
  • 5 in conversation