DATA Step, Macro, Functions and more

How to add summary stats to my dataset.

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

How to add summary stats to my dataset.

[ Edited ]

I have a data set with 109 rows and 40 columns. I want to find mean,meadian and std for every row and add those to my new data set. How shall I do that?

regards


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 6,933

Re: How to add summary stats to my dataset.

Posted in reply to jamsher79

While it's difficult to believe that a separate mean for each row would be useful, it's easy to get:

 

data want;

set have;

row_mean = mean(of _numeric_);

row_std = std(of _numeric_);

row_median = median(of _numeric_);

run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,840

Re: How to add summary stats to my dataset.

Posted in reply to jamsher79

As you haven't presented any test data (in the form of a datastep) or what you want the output to look like, I am only able to generalise here, but the general scenario would be:

1) run proc means on the data

2) merge the proc means data back onto your dataset

Occasional Contributor
Posts: 5

Re: How to add summary stats to my dataset.

Hey
I have a csv data added that in my project now want to find mean, median...but as I'm new I dont know anything.
regards
Super User
Super User
Posts: 9,840

Re: How to add summary stats to my dataset.

Posted in reply to jamsher79

"csv data added that in my project" - project?  What software are you using?  How did you add it?  Is is a dataset?  If your just starting you need to learn some basics, such as how to import data in your software before you start analysing it.  You can find videos at:

https://video.sas.com/category/videos/how-to-tutorials

 

One you have got your data into a dataset, then you follow the steps:

1) run proc means on the data - here is a paper on it.

http://www2.sas.com/proceedings/sugi29/240-29.pdf

 

2) merge the proc means data back onto your dataset

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202970.htm

 

I have nothing that you can see to give you any example on, I cannot see your computer!  As such I can give you an example using a built in dataset:

proc sort data=sashelp.class out=class;
  by sex;
run;
proc means data=class;
  by sex;
  var weight;
  output out=means mean=mean median=median;
run;
data class;
  merge class means;
  by sex;
run;
Respected Advisor
Posts: 3,275

Re: How to add summary stats to my dataset.

Posted in reply to jamsher79

@jamsher79, could you please clarify something?

 

I want to find mean,meadian [sic] and std for every row and add those to my new data set

 

Does this mean that you want new columns appended to the side of your existing data set, first new column contains the mean, next new column contains the median, next new column contains the standard deviation?

--
Paige Miller
Occasional Contributor
Posts: 5

Re: How to add summary stats to my dataset.

Posted in reply to PaigeMiller

Hey!

Thanks, I could manage.

Solution
3 weeks ago
Super User
Posts: 6,933

Re: How to add summary stats to my dataset.

Posted in reply to jamsher79

While it's difficult to believe that a separate mean for each row would be useful, it's easy to get:

 

data want;

set have;

row_mean = mean(of _numeric_);

row_std = std(of _numeric_);

row_median = median(of _numeric_);

run;

Super User
Posts: 13,941

Re: How to add summary stats to my dataset.

Posted in reply to jamsher79

It is frequently a suboptimal approach to include summary values in a data set as then for many further analysis you would have to remove them.

If you what to display a report with those values then SAS provides a number of report procedures that will either display just the summaries, such as proc means or proc report that can display all the values with summary rows.

 

On this forum it is a good idea to display some starting data and what you want the final result to look like given that starting data. Include just enough data to show your use cases, such as does this result need to be summarized in groups provided by identification variables, and then the desired results that you can calculate by hand. Mask any "sensitive" values with something like XXX or YYY for different levels of sensitive variables.

Super User
Posts: 6,933

Re: How to add summary stats to my dataset.

@ballardw,

 

In general, I agree with you.  In fact, I have started ignoring questions where I think that the end result is useless or worse.  In this case, I answered the question because I could at least picture a useful scenario.

 

Suppose each row represents a survey respondent and each column represents a question rated 1 to 5 (strongly disagree through strongly agree or some such).  Then you might use the row statistics to determine who is a "high rater" and who is a "low rater" and adjust accordingly.  Or you might examine variation within the row to see who actually took the time to answer and who just filled in the same number across the board.

 

All FWIW.

Super User
Posts: 13,941

Re: How to add summary stats to my dataset.

Posted in reply to Astounding

@Astounding wrote:

@ballardw,

 

In general, I agree with you.  In fact, I have started ignoring questions where I think that the end result is useless or worse.  In this case, I answered the question because I could at least picture a useful scenario.

 

Suppose each row represents a survey respondent and each column represents a question rated 1 to 5 (strongly disagree through strongly agree or some such).  Then you might use the row statistics to determine who is a "high rater" and who is a "low rater" and adjust accordingly.  Or you might examine variation within the row to see who actually took the time to answer and who just filled in the same number across the board.

 

All FWIW.


Absolutely agree and I have done such with several surveys, or in many cases max of variables as the survey software provides a crappy output because the "designer" used a survey product without testing data output and we really one the one choice made in a multiple choice single response...

However without a more concrete example from the OP I tend to go with a "this is likely a simple overall summary"

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 101 views
  • 0 likes
  • 5 in conversation