Hello, I'm very to new to SAS and I'm struggling with reshaping this data. I've created accumulating columns and counters for the first five columns (which are ratings collected from a survey). Now I need to create a summary table with just two variables: the food item and the average rating. For example, the new data set would have the observation arugula, and the final observation's value of tArug/nArug. What are the methods to do this?
First 5 observations:
Last observations:
In your data step you just want to keep the last observation.
An easy way to do that is with the END= option on the SET statement.
By adding an explicit OUTPUT statement only that final observation will be written to WANT.
data want;
set have end=eof;
.... do your accumulation logic ...
if eof then do;
... Now do the division to make rate out of SUM and NOBS ...
output;
end;
run;
@blurple wrote:
For example, the new data set would have the observation arugula, and the final observation's value of tArug/nArug. What are the methods to do this?
So no PineNut or Squash in the output data set?
The observations in the output data set will have each food item, and the average rating. So 5 rows, 2 columns. (by final observation I'm referring to the input data set, since the final row has the values that can be used for calculating all of the average ratings).
Show me the exact output data set you want for those five rows you presented originally.
Also, since we can't write code to use data in a screen capture, please provide (a portion of) the data as working SAS data step code which you can type in yourself, or follow these instructions. Do not provide the data in other forms.
data work.toppings;
infile datalines delimiter=',' dsd missover;
input Arugula PineNut;
datalines;
1,3
5,4
4,2
5,3
3,5
,4
2,
;
run;
data work.accum;
set work.toppings;
tArug+Arugula;
tPine+PineNut;
if Arugula ne '' then nArug+1;
if PineNut ne '' then nPine+1;
run;
(side note: can I do the above two in one data step? the accumulation code seems to be the problem when I put those lines in the first data step)
This is how I want the summarized output to look:
Just use PROC MEANS. Why re-invent the wheel?
data toppings;
input Arugula PineNut;
datalines;
1 3
5 4
4 2
5 3
3 5
. 4
2 .
;
proc means data=toppings N Nmiss Sum Mean Min Max ;
var Arugula PineNut;
run;
Thanks, that makes sense and it's a lot easier.
However I'd still like to learn other method(s).
In your data step you just want to keep the last observation.
An easy way to do that is with the END= option on the SET statement.
By adding an explicit OUTPUT statement only that final observation will be written to WANT.
data want;
set have end=eof;
.... do your accumulation logic ...
if eof then do;
... Now do the division to make rate out of SUM and NOBS ...
output;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.