Solved: Re: How to add summary stats to my dataset.

jamsher79 · Posted 07-30-2018 05:29 AM

I have a data set with 109 rows and 40 columns. I want to find mean,meadian and std for every row and add those to my new data set. How shall I do that?

regards

Astounding · Posted 07-30-2018 09:32 AM

While it's difficult to believe that a separate mean for each row would be useful, it's easy to get:

data want;

set have;

row_mean = mean(of _numeric_);

row_std = std(of _numeric_);

row_median = median(of _numeric_);

run;

View solution in original post

RW9 · Posted 07-30-2018 05:44 AM

As you haven't presented any test data (in the form of a datastep) or what you want the output to look like, I am only able to generalise here, but the general scenario would be:

1) run proc means on the data

2) merge the proc means data back onto your dataset

jamsher79 · Posted 07-30-2018 06:08 AM

Hey
I have a csv data added that in my project now want to find mean, median...but as I'm new I dont know anything.
regards

RW9 · Posted 07-30-2018 06:38 AM

"csv data added that in my project" - project? What software are you using? How did you add it? Is is a dataset? If your just starting you need to learn some basics, such as how to import data in your software before you start analysing it. You can find videos at:

https://video.sas.com/category/videos/how-to-tutorials

One you have got your data into a dataset, then you follow the steps:

1) run proc means on the data - here is a paper on it.

http://www2.sas.com/proceedings/sugi29/240-29.pdf

2) merge the proc means data back onto your dataset

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202970.htm

I have nothing that you can see to give you any example on, I cannot see your computer! As such I can give you an example using a built in dataset:

proc sort data=sashelp.class out=class;
  by sex;
run;
proc means data=class;
  by sex;
  var weight;
  output out=means mean=mean median=median;
run;
data class;
  merge class means;
  by sex;
run;

PaigeMiller · Posted 07-30-2018 07:01 AM

@jamsher79, could you please clarify something?

I want to find mean,meadian [sic] and std for every row and add those to my new data set

Does this mean that you want new columns appended to the side of your existing data set, first new column contains the mean, next new column contains the median, next new column contains the standard deviation?

--
Paige Miller

jamsher79 · Posted 07-31-2018 08:39 AM

Hey!

Thanks, I could manage.

Astounding · Posted 07-30-2018 09:32 AM

While it's difficult to believe that a separate mean for each row would be useful, it's easy to get:

data want;

set have;

row_mean = mean(of _numeric_);

row_std = std(of _numeric_);

row_median = median(of _numeric_);

run;

ballardw · Posted 07-30-2018 10:33 AM

It is frequently a suboptimal approach to include summary values in a data set as then for many further analysis you would have to remove them.

If you what to display a report with those values then SAS provides a number of report procedures that will either display just the summaries, such as proc means or proc report that can display all the values with summary rows.

On this forum it is a good idea to display some starting data and what you want the final result to look like given that starting data. Include just enough data to show your use cases, such as does this result need to be summarized in groups provided by identification variables, and then the desired results that you can calculate by hand. Mask any "sensitive" values with something like XXX or YYY for different levels of sensitive variables.

Astounding · Posted 07-30-2018 10:49 AM

@ballardw,

In general, I agree with you. In fact, I have started ignoring questions where I think that the end result is useless or worse. In this case, I answered the question because I could at least picture a useful scenario.

Suppose each row represents a survey respondent and each column represents a question rated 1 to 5 (strongly disagree through strongly agree or some such). Then you might use the row statistics to determine who is a "high rater" and who is a "low rater" and adjust accordingly. Or you might examine variation within the row to see who actually took the time to answer and who just filled in the same number across the board.

All FWIW.

ballardw · Posted 07-30-2018 11:23 AM

@Astounding wrote:

@ballardw,

In general, I agree with you. In fact, I have started ignoring questions where I think that the end result is useless or worse. In this case, I answered the question because I could at least picture a useful scenario.

Suppose each row represents a survey respondent and each column represents a question rated 1 to 5 (strongly disagree through strongly agree or some such). Then you might use the row statistics to determine who is a "high rater" and who is a "low rater" and adjust accordingly. Or you might examine variation within the row to see who actually took the time to answer and who just filled in the same number across the board.

All FWIW.

Absolutely agree and I have done such with several surveys, or in many cases max of variables as the survey software provides a crappy output because the "designer" used a survey product without testing data output and we really one the one choice made in a multiple choice single response...

However without a more concrete example from the OP I tend to go with a "this is likely a simple overall summary"

SAS Innovate 2025: Call for Content

Classroom Training Available!