- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataset with multiple readings for each person. I need to have just one row for each person in a way that whenever multiple readings are present; I need an average of both the readings. Is there a way to do it in the data step?
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes you can do it in a data step, but it would be simpler to use PROC SUMMARY. Can you provide more details?
proc summary data=have nway;
class person;
var variable1 variable2 variable3; /* Whatever list of variables you need goes here */
output out=want mean=;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes you can do it in a data step, but it would be simpler to use PROC SUMMARY. Can you provide more details?
proc summary data=have nway;
class person;
var variable1 variable2 variable3; /* Whatever list of variables you need goes here */
output out=want mean=;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you so much for posting this! This syntax worked out perfectly for me!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@code_blooded wrote:
Hi,
I have a dataset with multiple readings for each person. I need to have just one row for each person in a way that whenever multiple readings are present; I need an average of both the readings. Is there a way to do it in the data step?
Thanks in advance!
Yes but it can be a lot of work. Why do you want to do it in a data step?
If you have any sort of "person" identifier then proc means/summary would be a better way to go.
Dummy code as you have provided no actual details:
proc sort data=have; by personid; run; proc summary data=have; by personid; var _numeric_; output out=want (drop=_type_) mean= ; run;
Will create Want data set with the mean of all numeric variables retaining the existing variable name and add a variable _freq_ that has the count of "rows" used to get the summary. _Numeric_ is a special list word SAS uses in some places to indicate "use all numeric variables". If you only want some variables then place the names of the variables you want on the Var statement.
If you want something different then you need to provide more details.
To use a data step you will need to create variables that hold the sum and count, considering and missing values, of each variable you want summarized and then when the last person id is encountered calculate the mean and output. If you want different statistics then you get to do additional coding for each statistic and some of it can be quite daunting. Proc means/summary takes care of that for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you so much for recommending proc summary! This works seamlessly for my problem! 🙂