- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For every PHASE in every TYPE I'm trying to calculate the AVG and STD based on the last 2 quarters. This is my current code:
data have;
infile datalines truncover;
input type $ quarter phase counter percent;
datalines;
BB-B 202001 1 94 13
BB-B 202001 2 91 2
BB-B 202001 3 50 2
BB-B 202002 1 79 11
BB-B 202002 2 20 4
BB-B 202002 3 88 4
BB-B 202003 1 17 1
BB-B 202003 2 2 3
BB-B 202003 3 100 3
AA-C 202001 1 67 6
AA-C 202001 2 49 4
AA-C 202001 3 68 4
AA-C 202002 1 23 6
AA-C 202002 2 39 4
AA-C 202003 3 98 4
AA-C 202003 1 7 6
AA-C 202003 2 9 4
AA-C 202003 3 48 4
;
data want;
set have;
by type phase notsorted;
lcounter=lag(counter);
if last.phase then do
avg=mean(lcounter,counter);
std=std(lcounter,counter);
output;
end;
keep type phase avg std;
run;
This is my current output:
This is my desired output:
type phase AVG SDeviation
BB-B 1 (79+17)/2 (Sdeviation here)
BB-B 2 (20+2)/2 (Sdeviation here)
BB-B 3 (88/100)/2 (Sdeviation here)
AA-C 1 (23+7)/2 (Sdeviation here)
AA-C 2 (39+9)/2 (Sdeviation here)
AA-C 3 (98+48)/2 (Sdeviation here)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile datalines truncover;
input type $ quarter phase counter percent;
datalines;
BB-B 202001 1 94 13
BB-B 202001 2 91 2
BB-B 202001 3 50 2
BB-B 202002 1 79 11
BB-B 202002 2 20 4
BB-B 202002 3 88 4
BB-B 202003 1 17 1
BB-B 202003 2 2 3
BB-B 202003 3 100 3
AA-C 202001 1 67 6
AA-C 202001 2 49 4
AA-C 202001 3 68 4
AA-C 202002 1 23 6
AA-C 202002 2 39 4
AA-C 202003 3 98 4
AA-C 202003 1 7 6
AA-C 202003 2 9 4
AA-C 202003 3 48 4
;
proc sql;
create table want as
select type,phase,mean(counter) as mean,std(counter) as std
from (
select * from have group by type having quarter=max(quarter) or quarter=max(quarter)-1
)
group by type,phase;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your program is written to expect all the records for a given type/phase to be consecutive (even if not in ascending or descending sort order). But your data is not so grouped. Sort your data by type/phase first. Then run your code.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile datalines truncover;
input type $ quarter phase counter percent;
datalines;
BB-B 202001 1 94 13
BB-B 202001 2 91 2
BB-B 202001 3 50 2
BB-B 202002 1 79 11
BB-B 202002 2 20 4
BB-B 202002 3 88 4
BB-B 202003 1 17 1
BB-B 202003 2 2 3
BB-B 202003 3 100 3
AA-C 202001 1 67 6
AA-C 202001 2 49 4
AA-C 202001 3 68 4
AA-C 202002 1 23 6
AA-C 202002 2 39 4
AA-C 202003 3 98 4
AA-C 202003 1 7 6
AA-C 202003 2 9 4
AA-C 202003 3 48 4
;
proc sql;
create table want as
select type,phase,mean(counter) as mean,std(counter) as std
from (
select * from have group by type having quarter=max(quarter) or quarter=max(quarter)-1
)
group by type,phase;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
quarter=max(quarter) or quarter=max(quarter)-1
into
quarter=max(quarter) or quarter=max(quarter)-1 or quarter=max(quarter)-2 or quarter=max(quarter)-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp I used below code but it still only grabs the last 2 quarters instead of the last 4:
proc sql;
create table want as
select type,phase,mean(counter) as mean,std(counter) as std
from (
select * from outputdatasetname group by type having quarter=max(quarter) or quarter=max(quarter)-1 or quarter=max(quarter)-2 or quarter=max(quarter)-3
)
group by type,phase;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do yourself a favor from now on ... don't try to write your own mean and standard deviation (and median and minimum and maximum and a whole lot of other things) code in a DATA step (except for extremely unusual circumstances, which this is not). SAS has already programmed these calculations, tested it, debugged it, and proved that they work in a gazillion real world situations. You are essentially re-inventing the wheel. You (or your company or university) is paying for all this work that SAS has already done, so use that work SAS has already done.
Use PROC SUMMARY or PROC SQL (I strongly prefer PROC SUMMARY)
proc summary data=have(where=(quarter>=202002)) nway;
class type phase;
var counter;
output out=want mean=avg stddev=std;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Andalusia wrote:
What if I want to do it for the last 4 quarters instead of the last 2 as I have right now?
Extract the last X observations for every type/phase group first, then run PROC SUMMARY:
%let n_quarters = 2;
proc sort data=have;
by type phase quarter;
run;
data extract;
counta = 0;
do until (last.phase);
set have;
by type phase;
counta + 1;
end;
countb = 0;
do until (last.phase);
set have;
by type phase;
countb + 1;
if countb gt (counta - &n_quarters.) then output;
end;
drop counta countb;
run;
This is a slight expansion of the solution I provided here.