BookmarkSubscribeRSS Feed
Andalusia
Obsidian | Level 7

I have the following dataset:

data have;
  infile datalines truncover;
  input type $ quarter phase counter percent;
  datalines;
BB-B 202001 1 94 13
BB-B 202002 1 79 11
BB-B 202003 1 17 1
BB-B 202004 1 80 1
BB-B 202001 2 91 2
BB-B 202002 2 20 3
BB-B 202003 2 2 3
BB-B 202004 2 85 3
AA-C 202001 1 67 6
AA-C 202002 1 23 6
AA-C 202003 1 7 6
AA-C 202004 1 94 4
AA-C 202001 2 49 4
AA-C 202002 2 39 4
AA-C 202003 2 9 4
AA-C 202004 2 98 1
;

For every PHASE in every TYPE I want to calculate the AVERAGE and STANDARDDEVIATION of the last 2 QUARTERS.
My desired output is:

type   phase    AVG        SDeviation
BB-B   1        (17+80)/2  (Sdeviation here)
BB-B   2        (85+2)/2   (Sdeviation here)
AA-C   1        (94+7)/2   (Sdeviation here)
AA-C   2        (98+9)/2   (Sdeviation here)




8 REPLIES 8
Reeza
Super User
You want the standard deviation of two data points?
ballardw
Super User

@Andalusia wrote:

I have the following dataset:

data have;
  infile datalines truncover;
  input type $ quarter phase counter percent;
  datalines;
BB-B 202001 1 94 13
BB-B 202002 1 79 11
BB-B 202003 1 17 1
BB-B 202004 1 80 1
BB-B 202001 2 91 2
BB-B 202002 2 20 3
BB-B 202003 2 2 3
BB-B 202004 2 85 3
AA-C 202001 1 67 6
AA-C 202002 1 23 6
AA-C 202003 1 7 6
AA-C 202004 1 94 4
AA-C 202001 2 49 4
AA-C 202002 2 39 4
AA-C 202003 2 9 4
AA-C 202004 2 98 1
;

For every PHASE in every TYPE I want to calculate the AVERAGE and STANDARDDEVIATION of the last 2 QUARTERS.
My desired output is:

type   phase    AVG        SDeviation
BB-B   1        (17+80)/2  (Sdeviation here)
BB-B   2        (85+2)/2   (Sdeviation here)
AA-C   1        (94+7)/2   (Sdeviation here)
AA-C   2        (98+9)/2   (Sdeviation here)





This seems to do what is requested for the given data but two points for a standard deviation is a bit odd.

data want;
   set have;
   by  type phase notsorted;
   lcounter=lag(counter);
   if last.phase then do
      avg=mean(lcounter,counter);
      std=std(lcounter,counter);
      output;
   end;
   keep type phase avg std;
run;
Andalusia
Obsidian | Level 7
@ballardw in my real dataset I have much more points. Just a question. What if I want to calculate the AVG and STD of based on the last 4 quarters instead of 2?
Andalusia
Obsidian | Level 7
@Kurt_Bremser Check out my latest question, its almost the same as this one but with more sufficient example data.
Ksharp
Super User
data have;
  infile datalines truncover;
  input type $ quarter phase counter percent;
  datalines;
BB-B 202001 1 94 13
BB-B 202001 2 91 2
BB-B 202001 3 50 2
BB-B 202002 1 79 11
BB-B 202002 2 20 4
BB-B 202002 3 88 4
BB-B 202003 1 17 1
BB-B 202003 2 2 3
BB-B 202003 3 100 3
AA-C 202001 1 67 6
AA-C 202001 2 49 4
AA-C 202001 3 68 4
AA-C 202002 1 23 6
AA-C 202002 2 39 4
AA-C 202003 3 98 4
AA-C 202003 1 7 6
AA-C 202003 2 9 4
AA-C 202003 3 48 4
;

proc sql;
create table want as
select  type,phase,mean(counter) as mean,std(counter) as std
from (
select * from have group by type having quarter=max(quarter) or quarter=max(quarter)-1
)
group by type,phase;
quit;
Reeza
Super User

Do you have a SAS/ETS license? Check using the following code:

proc product_status;run;

Check your log for SAS/ETS.

Check the log for SAS/ETS listed.

ballardw
Super User

@Andalusia wrote:
@ballardw in my real dataset I have much more points. Just a question. What if I want to calculate the AVG and STD of based on the last 4 quarters instead of 2?

The Lag approach I suggested for two cases can be extended back to 100 records. However looking back that way adds in potential issues with looking back into a previous phase or type.

 

You are adding in some complications that raise questions of 1) will there always going to be at least 4 quarters? 2) if there are fewer than 4 quarters but you want 4 what to do, 3) Are your "4 quarters" consecutive, if not how to treat gaps?

Also is "last" actually best description? Quarters usually relate to calendar dates and do you mean "after a given date" instead? (This is much easier to do once actual dates are supplied in data).

 

Make example data that demonstrates more of the actual data and replicating and problem cases such a fewer "quarters", missing quarters if they ever occur in the data (you have looked haven't you).

Then show the desired values for that example data.

Your "example" data might imply that 4 quarters means a calendar year. In which case construct a calendar year variable and use proc means or summary with type, year and phase as class variables and request the statistics for the variable as a VAR variable.

Example:

data need;
   set have;
   year = int(quarter/100);
run;

proc summary data=need nway;
   class type year phase;
   var counter;
   output out=work.summary mean= std= /autoname;
run;

 

If your last 4 quarters crosses a calendar year boundary then you need to show example data and possibly provide a cleaner description.

BTW, consider making your quarter variable a date value because then SAS provides a number of tools that are easier to work with than pseudo-date values that need parsing for every step.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 689 views
  • 2 likes
  • 5 in conversation