I have a data set with 109 rows and 40 columns. I want to find mean,meadian and std for every row and add those to my new data set. How shall I do that?
regards
While it's difficult to believe that a separate mean for each row would be useful, it's easy to get:
data want;
set have;
row_mean = mean(of _numeric_);
row_std = std(of _numeric_);
row_median = median(of _numeric_);
run;
As you haven't presented any test data (in the form of a datastep) or what you want the output to look like, I am only able to generalise here, but the general scenario would be:
1) run proc means on the data
2) merge the proc means data back onto your dataset
"csv data added that in my project" - project? What software are you using? How did you add it? Is is a dataset? If your just starting you need to learn some basics, such as how to import data in your software before you start analysing it. You can find videos at:
https://video.sas.com/category/videos/how-to-tutorials
One you have got your data into a dataset, then you follow the steps:
1) run proc means on the data - here is a paper on it.
http://www2.sas.com/proceedings/sugi29/240-29.pdf
2) merge the proc means data back onto your dataset
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202970.htm
I have nothing that you can see to give you any example on, I cannot see your computer! As such I can give you an example using a built in dataset:
proc sort data=sashelp.class out=class; by sex; run; proc means data=class; by sex; var weight; output out=means mean=mean median=median; run; data class; merge class means; by sex; run;
@jamsher79, could you please clarify something?
I want to find mean,meadian [sic] and std for every row and add those to my new data set
Does this mean that you want new columns appended to the side of your existing data set, first new column contains the mean, next new column contains the median, next new column contains the standard deviation?
Hey!
Thanks, I could manage.
While it's difficult to believe that a separate mean for each row would be useful, it's easy to get:
data want;
set have;
row_mean = mean(of _numeric_);
row_std = std(of _numeric_);
row_median = median(of _numeric_);
run;
It is frequently a suboptimal approach to include summary values in a data set as then for many further analysis you would have to remove them.
If you what to display a report with those values then SAS provides a number of report procedures that will either display just the summaries, such as proc means or proc report that can display all the values with summary rows.
On this forum it is a good idea to display some starting data and what you want the final result to look like given that starting data. Include just enough data to show your use cases, such as does this result need to be summarized in groups provided by identification variables, and then the desired results that you can calculate by hand. Mask any "sensitive" values with something like XXX or YYY for different levels of sensitive variables.
In general, I agree with you. In fact, I have started ignoring questions where I think that the end result is useless or worse. In this case, I answered the question because I could at least picture a useful scenario.
Suppose each row represents a survey respondent and each column represents a question rated 1 to 5 (strongly disagree through strongly agree or some such). Then you might use the row statistics to determine who is a "high rater" and who is a "low rater" and adjust accordingly. Or you might examine variation within the row to see who actually took the time to answer and who just filled in the same number across the board.
All FWIW.
@Astounding wrote:
In general, I agree with you. In fact, I have started ignoring questions where I think that the end result is useless or worse. In this case, I answered the question because I could at least picture a useful scenario.
Suppose each row represents a survey respondent and each column represents a question rated 1 to 5 (strongly disagree through strongly agree or some such). Then you might use the row statistics to determine who is a "high rater" and who is a "low rater" and adjust accordingly. Or you might examine variation within the row to see who actually took the time to answer and who just filled in the same number across the board.
All FWIW.
Absolutely agree and I have done such with several surveys, or in many cases max of variables as the survey software provides a crappy output because the "designer" used a survey product without testing data output and we really one the one choice made in a multiple choice single response...
However without a more concrete example from the OP I tend to go with a "this is likely a simple overall summary"
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.