I have the following dataset:
data have;
infile datalines truncover;
input type $ quarter phase counter percent;
datalines;
BB-B 202001 1 94 13
BB-B 202002 1 79 11
BB-B 202003 1 17 1
BB-B 202004 1 80 1
BB-B 202001 2 91 2
BB-B 202002 2 20 3
BB-B 202003 2 2 3
BB-B 202004 2 85 3
AA-C 202001 1 67 6
AA-C 202002 1 23 6
AA-C 202003 1 7 6
AA-C 202004 1 94 4
AA-C 202001 2 49 4
AA-C 202002 2 39 4
AA-C 202003 2 9 4
AA-C 202004 2 98 1
;
For every PHASE in every TYPE I want to calculate the AVERAGE and STANDARDDEVIATION of the last 2 QUARTERS.
My desired output is:
type phase AVG SDeviation
BB-B 1 (17+80)/2 (Sdeviation here)
BB-B 2 (85+2)/2 (Sdeviation here)
AA-C 1 (94+7)/2 (Sdeviation here)
AA-C 2 (98+9)/2 (Sdeviation here)
@Andalusia wrote:
I have the following dataset:
data have; infile datalines truncover; input type $ quarter phase counter percent; datalines; BB-B 202001 1 94 13 BB-B 202002 1 79 11 BB-B 202003 1 17 1 BB-B 202004 1 80 1 BB-B 202001 2 91 2 BB-B 202002 2 20 3 BB-B 202003 2 2 3 BB-B 202004 2 85 3 AA-C 202001 1 67 6 AA-C 202002 1 23 6 AA-C 202003 1 7 6 AA-C 202004 1 94 4 AA-C 202001 2 49 4 AA-C 202002 2 39 4 AA-C 202003 2 9 4 AA-C 202004 2 98 1 ;
For every PHASE in every TYPE I want to calculate the AVERAGE and STANDARDDEVIATION of the last 2 QUARTERS.
My desired output is:type phase AVG SDeviation BB-B 1 (17+80)/2 (Sdeviation here) BB-B 2 (85+2)/2 (Sdeviation here) AA-C 1 (94+7)/2 (Sdeviation here) AA-C 2 (98+9)/2 (Sdeviation here)
This seems to do what is requested for the given data but two points for a standard deviation is a bit odd.
data want; set have; by type phase notsorted; lcounter=lag(counter); if last.phase then do avg=mean(lcounter,counter); std=std(lcounter,counter); output; end; keep type phase avg std; run;
So you have multiple observations per type and quarter?
Please take care to set up your example data sufficiently so that it really illustrates your issue.
data have;
infile datalines truncover;
input type $ quarter phase counter percent;
datalines;
BB-B 202001 1 94 13
BB-B 202001 2 91 2
BB-B 202001 3 50 2
BB-B 202002 1 79 11
BB-B 202002 2 20 4
BB-B 202002 3 88 4
BB-B 202003 1 17 1
BB-B 202003 2 2 3
BB-B 202003 3 100 3
AA-C 202001 1 67 6
AA-C 202001 2 49 4
AA-C 202001 3 68 4
AA-C 202002 1 23 6
AA-C 202002 2 39 4
AA-C 202003 3 98 4
AA-C 202003 1 7 6
AA-C 202003 2 9 4
AA-C 202003 3 48 4
;
proc sql;
create table want as
select type,phase,mean(counter) as mean,std(counter) as std
from (
select * from have group by type having quarter=max(quarter) or quarter=max(quarter)-1
)
group by type,phase;
quit;
Do you have a SAS/ETS license? Check using the following code:
proc product_status;run;
Check your log for SAS/ETS.
Check the log for SAS/ETS listed.
@Andalusia wrote:
@ballardw in my real dataset I have much more points. Just a question. What if I want to calculate the AVG and STD of based on the last 4 quarters instead of 2?
The Lag approach I suggested for two cases can be extended back to 100 records. However looking back that way adds in potential issues with looking back into a previous phase or type.
You are adding in some complications that raise questions of 1) will there always going to be at least 4 quarters? 2) if there are fewer than 4 quarters but you want 4 what to do, 3) Are your "4 quarters" consecutive, if not how to treat gaps?
Also is "last" actually best description? Quarters usually relate to calendar dates and do you mean "after a given date" instead? (This is much easier to do once actual dates are supplied in data).
Make example data that demonstrates more of the actual data and replicating and problem cases such a fewer "quarters", missing quarters if they ever occur in the data (you have looked haven't you).
Then show the desired values for that example data.
Your "example" data might imply that 4 quarters means a calendar year. In which case construct a calendar year variable and use proc means or summary with type, year and phase as class variables and request the statistics for the variable as a VAR variable.
Example:
data need; set have; year = int(quarter/100); run; proc summary data=need nway; class type year phase; var counter; output out=work.summary mean= std= /autoname; run;
If your last 4 quarters crosses a calendar year boundary then you need to show example data and possibly provide a cleaner description.
BTW, consider making your quarter variable a date value because then SAS provides a number of tools that are easier to work with than pseudo-date values that need parsing for every step.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.