Solved: Re: Summarize data from long dataset

rfarmenta · Posted 01-07-2019 02:19 PM

I have a long dataset with up to 6000 observations per participant and 121 participants. One of the variables is HR and I need to create average heart rate (variable=HR) for each participant and heart rate in each activity they participated in (Variable=activity with categories 0-5)

I can create the average HR for each participant using the following:

PROC MEANS DATA=mod_data NWAY NOPRINT; CLASS study_id;
VAR HR;
OUTPUT
OUT = SUMS(DROP=_TYPE_ RENAME=(_FREQ_=N_VISITS))
N = N_HR
MEAN = M_HR; RUN;

I am not quite sure how to get the average heart rate for each participant in each activity. SO I would need 6 new variables, one for each activity that gives the participants average heart rate in that activity. Not all participants did all activities and some participants did all of the activities so there are varying amounts of missing data. I need a dataset that looks like the following:

Study_id	HRM_ACT0	HRM_ACT1	HRM_ACT2	HRM_ACT3	HRM_ACT4	HRM_ACT5	M_HR
001	150	.	155	156	.	143	148
002	168	130	160	148	123	134	150
003	130	.	124	.	170	137	155
004	122	130	178	.	.	155	152

I've tried a couple ways to do this unsuccessfully so any help would be appreciated. A sample dataset is attached Thank you in advance.

Reeza · Posted 01-07-2019 07:04 PM

You're missing an ID variable in your TRANSPOSE code, which should be the activity. Because you want that number as the ID.

You sort of kluged that in the second example, except you can have multiple ID variables, you don't have to pre-combine them.

View solution in original post

Reeza · Posted 01-07-2019 02:21 PM

@rfarmenta wrote:

I have a long dataset with up to 6000 observations per participant and 121 participants. One of the variables is HR and I need to create average heart rate (variable=HR) for each participant and heart rate in each activity they participated in (Variable=activity with categories 0-5)

I can create the average HR for each participant using the following:
PROC MEANS DATA=mod_data NWAY NOPRINT; CLASS study_id;
VAR HR;
OUTPUT
OUT = SUMS(DROP=_TYPE_ RENAME=(_FREQ_=N_VISITS))
N = N_HR
MEAN = M_HR; RUN;
I am not quite sure how to get the average heart rate for each participant in each activity. SO I would need 6 new variables, one for each activity that gives the participants average heart rate in that activity. Not all participants did all activities and some participants did all of the activities so there are varying amounts of missing data. I need a dataset that looks like the following:

Study_id

HRM_ACT0

HRM_ACT1

HRM_ACT2

HRM_ACT3

HRM_ACT4

HRM_ACT5

M_HR

001

150

.

155

156

.

143
148

002

168

130

160

148

123

134

150

003

130

.

124

.

170

137

155

004

122

130
178
.

.

155

152

I've tried a couple ways to do this unsuccessfully so any help would be appreciated. A sample dataset is attached Thank you in advance.

Just add Activity to a CLASS or BY statement to get data by Activity.

PROC MEANS is a foundational tool in SAS, I would strongly recommend you read up on it and learn how to use it properly.

PROC MEANS and FREQ should be the most frequently used tools in your toolkit for statistical analysis.

PaigeMiller · Posted 01-07-2019 02:43 PM

PROC MEANS is a foundational tool in SAS, I would strongly recommend you read up on it and learn how to use it properly.

If I was going to create a list of maxims, this would be one of them!

And I would mention PROC MEANS has an identical twin sister named PROC SUMMARY.

--
Paige Miller

Reeza · Posted 01-07-2019 02:23 PM

Essentially, these are the same questions, summarize and transpose.

https://communities.sas.com/t5/SAS-Enterprise-Guide/Summing-values-in-a-column-dependent-on-dates-an...

ballardw · Posted 01-07-2019 04:01 PM

Do you need a data set (further programming uses it) or a report (people read it)?

For a data set add what ever variable you have that indicates the activity to the CLASS statement.

For a report either Proc Tabulate or Report and use the activity as an across variable.

rfarmenta · Posted 01-07-2019 07:01 PM

Thank you! I need data and I was able to figure this out using proc means and proc transpose using the following code:

PROC MEANS DATA=mod_data NWAY NOPRINT; 
CLASS study_id activity;
VAR HR;
OUTPUT
OUT = SUMS
MEAN = M_HR; RUN;



proc transpose data=SUMS_test(drop=_type_ _freq_)
               out=transposed_hrm(drop=activity _label_)
               name=activity prefix=ACT
               suffix=HRM;
  by study_id;
  var M_HR;
run;

However, I am running into one issue that I've run into in the past which is why I had initially asked this questions. The transpose creates 6 new variables (act1hrm-act6hrm). When I transpose the data is puts each participants information in consecutive variables. For example, if a participant has HR data for activity 1, 3, and 5, the transpose puts the data in the act1hrm, act2hrm, and act3hrm instead of act1hrm, act3hrm, and act5hrm where the data should be placed. I hope this makes sense.

Previously, when I was doing something similar I used the solution below as suggested by another user to concatenate variables before transposing. However, this does not seem to work when modifying the code to use the HR variable, which is continuous, instead of the hrzone variable, which only has 3 categories. I am likely not doing something correctly and I know I need to read up on proc means more, which I am doing. But I can't seem to figure this out. Any help would be greatly appreciated.

proc means data=mod_data n completetypes;
  var indicator;
  class study_id activity hrzone;
  types study_id*activity*hrzone;
  output out=acthr_summary n=acthr;
  title 'Time in activity by hrzone level for each participant';
run;

proc contents data=acthr_summary;
run;

* concatenate 'ACTHR', activity, hrzone, and 'T' to use when 
* naming variable after transpose ;

data acthr_summary_with_combo;
  length combo $ 8;
  set acthr_summary(drop=_type_ _freq_);
  combo=cat('ACTHR',activity,hrzone,'T');
run;

proc contents data=acthr_summary_with_combo;
run;

proc transpose data=acthr_summary_with_combo(drop=activity hrzone)
               out=transposed_acthr(drop=_name_);
  by study_id;
  var acthr;
  id combo;
run;

Reeza · Posted 01-07-2019 07:04 PM

You're missing an ID variable in your TRANSPOSE code, which should be the activity. Because you want that number as the ID.

You sort of kluged that in the second example, except you can have multiple ID variables, you don't have to pre-combine them.

rfarmenta · Posted 01-07-2019 07:09 PM

I figured it was going to be something simple, that worked! Thank you!

Registration is open

SAS Training: Just a Click Away