Your SAS programs, embedded in web apps and elsewhere

RE: Reduction techniques for SAS

Reply
N/A
Posts: 0

RE: Reduction techniques for SAS

I am not sure if this is the right forum to ask this question but here it goes.

I am trying to write code for reducing data. The data is from an activity study that was conducted by my mentor at the University of Utah. I have read in the data to a SAS data set and now get to manipulate it. The data is currently in long columns that are about 27000 cases/observations long. Each observation/case represents 30 seconds of data and I want to reduce to 60 seconds of data or one minute per case/observation. Is there a function in SAS to aggregate two observations/cases in a sequential fashion and then drop even or odd time values without aggregating them? Ideas on this? I have been toying with the idea of transposing the data and using an array for this. Thanks for any help or direction to the right place to discuss this issue.



--Jonathan
SAS Super FREQ
Posts: 8,739

Re: RE: Reduction techniques for SAS

Hi:
This forum is not the right place for your question. You may not need PROC TRANSPOSE. You may be able to reduce your data by using a PROC MEANS/PROC SUMMARY step. What if your data looked like this:
[pre]

Obs case date time hour minute second measure

1 BLUEFISH 01/01/2007 0:00:00 0 0 0 512.982
2 BLUEFISH 01/01/2007 0:00:30 0 0 30 96.545
3 BLUEFISH 01/01/2007 0:01:00 0 1 0 584.263
4 BLUEFISH 01/01/2007 0:01:30 0 1 30 551.806
5 BLUEFISH 01/01/2007 0:02:00 0 2 0 747.991
6 BLUEFISH 01/01/2007 0:02:30 0 2 30 285.242
7 BLUEFISH 01/01/2007 0:03:00 0 3 0 357.630
8 BLUEFISH 01/01/2007 0:03:30 0 3 30 928.783
9 BLUEFISH 01/01/2007 0:04:00 0 4 0 21.974
10 BLUEFISH 01/01/2007 0:04:30 0 4 30 454.571
11 REDFISH 01/01/2007 0:00:00 0 0 0 252.436
12 REDFISH 01/01/2007 0:00:30 0 0 30 783.863
13 REDFISH 01/01/2007 0:01:00 0 1 0 151.454
14 REDFISH 01/01/2007 0:01:30 0 1 30 323.465
15 REDFISH 01/01/2007 0:02:00 0 2 0 814.048
16 REDFISH 01/01/2007 0:02:30 0 2 30 241.901
17 REDFISH 01/01/2007 0:03:00 0 3 0 220.134
18 REDFISH 01/01/2007 0:03:30 0 3 30 228.564
19 REDFISH 01/01/2007 0:04:00 0 4 0 926.583
20 REDFISH 01/01/2007 0:04:30 0 4 30 122.393

[/pre]

you could run a PROC MEANS by CASE, DATE, HOUR and MINUTE as the CLASS variables, which would reduce the data in half like this:
[pre]
sum_
Obs case date hour minute minute

1 BLUEFISH 01/01/2007 0 0 609.53
2 BLUEFISH 01/01/2007 0 1 1136.07
3 BLUEFISH 01/01/2007 0 2 1033.23
4 BLUEFISH 01/01/2007 0 3 1286.41
5 BLUEFISH 01/01/2007 0 4 476.55
6 REDFISH 01/01/2007 0 0 1036.30
7 REDFISH 01/01/2007 0 1 474.92
8 REDFISH 01/01/2007 0 2 1055.95
9 REDFISH 01/01/2007 0 3 448.70
10 REDFISH 01/01/2007 0 4 1048.98

[/pre]

In the above output, you can see that the 609.53, the value of sum_minute for the first observation above, is the sum of the measure value for the 2 obs for minute 0 for BLUEFISH on 01/01/2007. This was done with the following PROC MEANS code:
[pre]

proc means data=fishes sum nway noprint;
var measure;
class case date hour minute;
output out=fishedout sum=sum_minute;
run;

[/pre]

If, however, you did not want the sum of the 2 obs (is this what you meant by "aggregating"), then you could probably use a DATA step program to delete every other observation without doing any aggregation. Or, they could help you tweak the PROC MEANS program to get the output data set that you wanted.

Tech Support would be the best place to take this question. To contact Tech Support, go to http://support.sas.com and on the left hand navigation pane, click on the link that says "Submit a Problem"

cynthia
N/A
Posts: 0

Re: RE: Reduction techniques for SAS

Thank you for the tips. I did think about proc transpose but I ended up developing a more elegant solution.

What I did was use the lag function to duplicate the activity counts column and added my original activity counts with the new column created by the lag function (of course I had to set missing values to zero). I created an index variable by using my datetime variable (a sas date of both date and time to the second) and subtracted my starttime (which was a macro). Then I used a do-loop to incremented by 2 and compared my index variable to the do-loop index multipled by my epoch (which was a macro equaling 30seconds). The result is that my do loop would only keep even observations with the use of an explicit output statement.

my code looked like this:

data Bbb.sub215_actmin_1 ;
set Bbb.sub215_actmin ;
actlag=lag(activity) ;
pedolag=lag(pedometer) ;
Act=activity+actlag ;
Ped=pedometer+pedolag ;
if Act=. then Act=0 ;
if Ped=. then Ped=0 ;

retain datetime ;
index=(datetime - &sdt) ;
run ;
data Bbb.sub215_actmin_2 ;
set Bbb.sub215_actmin_1 ;
do i= 2 to 100000 by 2 ;
if index = i * &epo then output Bbb.sub215_actmin_2 ;
end ;
keep pedometer activity ped act datetime index ;
run ;
Ask a Question
Discussion stats
  • 2 replies
  • 189 views
  • 0 likes
  • 2 in conversation