BookmarkSubscribeRSS Feed
peatjohnston
Calcite | Level 5
Hello,

I have a set of data given to me by a student who entered the data in the following format:

sub$ d v
a 5 2
a 19 3
a 20 2
a 21 5
a 23 2
a 18 3
a 9 .
b 6 2
b 18 1
b 19 3
b 18 4
b 11 2
b 3 .

I need to take the average of two consecutive numbers in column 'd' and put them in a new column so that I can multiply them by column 'v' for each of the subjects (column 'sub$').
The result of taking the average (ave) should look like this:

sub$ d v ave
a 5 2 12
a 19 3 19.5
a 20 2 20.5
a 21 5 22
a 23 2 20.5
a 18 3 13.5
a 9 . 0
b 6 2 12
b 18 1 18.5
b 19 3 18.5
b 18 4 14.5
b 11 2 7
b 3 . 0

My first thought was to transpose the data (I have other variables that can be used to transpose each row into a column for all subjects), and apply the operations on the newly created variables. But if there is an easier way, I would be interested to learn; I have spent a day looking through online documentation.

Thanks in advance.
4 REPLIES 4
Peter_C
Rhodochrosite | Level 12
here is a simple merge-without-by
data t1 ;
input sub $ d v ;
list; datalines;
a 5 2
a 19 3
a 20 2
a 21 5
a 23 2
a 18 3
a 9 .
b 6 2
b 18 1
b 19 3
b 18 4
b 11 2
b 3 .
;
data aves ;
merge t1 t1( firstobs=2 keep= sub d rename=( sub=sub2 d=d2 )) ;
* notice, no By statement ;
if sub ne sub2 then do ;
* handle last for SUB ;
ave = 0 ;
end ;
else
ave = (d + d2 ) /2 ;
drop sub2 d2 ;
put ( sub d v ave )( $3. 3*best6. );
run ;
which puts this report into the log[pre]a 5 2 12
a 19 3 19.5
a 20 2 20.5
a 21 5 22
a 23 2 20.5
a 18 3 13.5
a 9 . 0
b 6 2 12
b 18 1 18.5
b 19 3 18.5
b 18 4 14.5
b 11 2 7
b 3 . 0
NOTE: There were 12 observations read from the data set WORK.T1.
NOTE: There were 13 observations read from the data set WORK.T1.
NOTE: The data set WORK.AVES has 13 observations and 4 variables.
NOTE: DATA statement used[/pre]Without the BY statement the data step merges the "next" row columns SUB and D - renamed to ensure the step has next D in a different name. With no BY statement there is no LAST.SUB available so SUB2 (the next SUB) clarifies the end of a SUB when these differ.

The next stage AVE * V could be done in the same step, but you want to work on that yourself...... 😉
Ksharp
Super User
Hi.
Peter.C 's code is great.
Peter ,I have not think 'merge' statement can be used in that way. 🙂


[pre]
data temp;
input sub$ d v;
datalines;
a 5 2
a 19 3
a 20 2
a 21 5
a 23 2
a 18 3
a 9 .
b 6 2
b 18 1
b 19 3
b 18 4
b 11 2
b 3 .
;
run;
data result;
set temp;
by sub;
ave=mean(lag(d),d);
if not first.sub then do;
obs=_n_-1;
set temp point=obs;
output;
end;
if last.sub then do;
ave=0;
set temp point=_n_;
output;
end;
run;

[/pre]



Ksharp Message was edited by: Ksharp
Peter_C
Rhodochrosite | Level 12
thank you kSharp
it is the classic "read-ahead"
should be around the archives in more than one posting
frequently brought out when a poster wants LEAD instead of LAG

peterC
Ksharp
Super User
Hi.Peter.
I know the usage of 'merge' statement without 'by' statement,But I forget the dataset option 'firstobs' which can do it. It enlight my brain, and give me one more approach to process these data.Thanks.


Best Regards.
Ksharp

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2211 views
  • 0 likes
  • 3 in conversation