turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- total visits and average score in a datastep

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-08-2017 03:20 PM

Hi all,

I need to sum encounter by ID and then average the answers to the Q3 (will always be 1 or 0). Looking for an easy way...preferable in a data step. I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together. Is there an easier way in a data step?

Want a table that looks like this:

ID | Q3 | Total | Q3_Avg |

A | 1 | 5 | 0.6 |

A | 1 | 5 | 0.6 |

A | 0 | 5 | 0.6 |

A | 1 | 5 | 0.6 |

A | 0 | 5 | 0.6 |

B | 1 | 4 | 2 |

B | 1 | 4 | 2 |

B | 0 | 4 | 2 |

B | 0 | 4 | 2 |

Help is appreciated!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 03:31 PM

What does your data start out looking like?

Please post data in a data step and the corresponding output expected. Anything you've tried is helpful too, so we don't suggest things you've already attempted.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

11-08-2017 03:35 PM

It basically starts out as:

ID | Q3 |

A | 1 |

A | 1 |

A | 0 |

A | 1 |

A | 0 |

B | 1 |

B | 1 |

B | 0 |

B | 0 |

So need to add two new variables Total and Average. I did this to get the total encounters by ID, but I'm trying to avoid all these micro steps i know how to do and just do one datastep.

proc sql;

select ID,

Count(ID) as Total

from diary004_4

group by ID;

quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 03:59 PM

Do you have a data step process and you want to build this calculation into that process?

Because you're merging the data with a summary statistic you must have two passes of the data. The most efficient can vary, perhaps a DoW loop is the best in your circumstance, but another approach would be to use a single SQL step or to do the calculation separately via PROC MEANS/SQL and merge the data in.

```
******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
output out=avg_values mean(height)=avg_height;
run;
*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;
data class_data;
merge class avg_values;
by sex;
run;
proc print data=class_data;
run;
*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;
```

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 04:06 PM

@Reeza mentioned a DOW loop in passing. That would be my approach. For example, assuming your data set is sorted:

data want;

total = 0;

q3_avg = 0;

do until (last.id);

set have;

by id;

total + 1;

q3_avg + q3;

end;

q3_avg = q3_avg / total;

do until (last.id);

set have;

by id;

output;

end;

run;

It's a little complex, but gets you there in one step.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

11-08-2017 04:29 PM

This is so close. I think I miss communicated a little. I need the Q3_avg to the the SUM of Q3/Total. Right now, the output for on ID looks like:

Total=8

Q3_avg=1

when all answers to q3 is '1'...so the Q3_Ave should be 8. Make sense?

Total=8

Q3_avg=1

when all answers to q3 is '1'...so the Q3_Ave should be 8. Make sense?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 04:40 PM

Have you looked at PROC FREQ output?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 04:40 PM

Not sure if that makes sense ... when all the Q3 are 1, the sum of Q3 is 8. And total is 8. So Q3_avg would be 8 / 8 = 1.

I generalized the results for the first ID where the average is 0.6. (The Q3_avg for the second ID didn't really make sense to me.)

If that's not right, maybe you could show 3 IDs in your example, all with the right Q3_avg values.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 04:10 PM

you can join these two tables:

create first table;

proc sql;

create table tablo as

count(ID) as total,

avg(q3) as aver

from tablo t1

group by

total,

aver;

*********

then inner join between fist table and given above.

(proc summarize also do this but i didnt remember)

create first table;

proc sql;

create table tablo as

count(ID) as total,

avg(q3) as aver

from tablo t1

group by

total,

aver;

*********

then inner join between fist table and given above.

(proc summarize also do this but i didnt remember)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 04:14 PM

Hi,

You can achieve this in many simple ways:

one way is :

proc sql;

create table need as

select id,Q3,count(id) as total,sum(Q3)/count(id) as Avg

from have

group by id;

quit;

Thanks,

Suryakiran

Suryakiran

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-08-2017 06:24 PM

jenim514 wrote:

Hi all,

I need to sum encounter by ID and then average the answers to the Q3 (will always be 1 or 0). Looking for an easy way...preferable in a data step. I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together. Is there an easier way in a data step?

Want a table that looks like this:

ID Q3 Total Q3_Avg A 1 5 0.6 A 1 5 0.6 A 0 5 0.6 A 1 5 0.6 A 0 5 0.6 B 1 4 2 B 1 4 2 B 0 4 2 B 0 4 2

Help is appreciated!!

Please explain how you get a total of **5** for ID=A. This looks a lot more like a COUNT of non-missing values than a SUM. So the SUM=3 / Count=5 does yield 0.6 for Id A. But then for B you would have Sum=2 / Count=4 = 0.5 not 2. So there is something missing in your description.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

11-09-2017 12:10 PM

ok, let me think about this. you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not. i will look at it again and re-frame my question. thanks all!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jenim514

11-09-2017 12:59 PM

jenim514 wrote:

Been there, got the headaches.

You might ask if there is a weighting variable involved they forgot to mention. If the weight changes between A and B that could account for different results with the same calculations I proposed. Since a variable named Q3 seem like a candidate for "question 3" and questions often occur in surveys and surveys often weight data ....

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

11-09-2017 04:00 PM

@jenim514 This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

1. I concur the PROC readymeals approaches

2. "* Double *DOW" is slick

3. Reemerging stats in Proc sql is great

If you are a learner like me, just play

**data** have;

input ID $ Q3;

datalines;

A 1

A 1

A 0

A 1

A 0

B 1

B 1

B 0

B 0

;

**data** _null_;

if _n_=**1** then do;

if **0** then set have;

length total q3_avg **8**;

dcl hash h(multidata: 'y',ordered: 'y');

h.definekey('id');

h.definedata('id', 'q3','total','q3_avg');

h.definedone();

call missing (total, q3_avg);

end;

_total = **0**;

_q3_avg = **0**;

do until(last.id);

set have end=last;

by id;

_total + **1**;

_q3_avg + q3;

h.add();

if last.id then

do;

temp = _q3_avg/_total;

rc = h.find();

q3_avg = temp;

total=_total;

if (rc = **0**) then

do;

rc = h.replacedup();

rc = h.find_next();

q3_avg = temp;

total=_total;

do while(rc = **0**);

rc = h.replacedup();

rc = h.find_next();

q3_avg = temp;

total=_total;

end;

end;

end;

end;

if last then h.output(dataset:'want');

**run**;

Cheers!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to novinosrin

11-09-2017 04:17 PM - edited 11-09-2017 04:17 PM

This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

@novinosrin If there was any uncertainty about your 'geek' status or SAS expertise consider it certain now