Hello All,
I have the following question: I have a table that contains the utilisation of a server per day. This table contains data for several servers:
Date | Server | Usage |
04.09.2013 | server1 | 104 |
05.09.2013 | server1 | 120 |
06.09.2013 | server1 | 114 |
07.09.2013 | server1 | 100 |
08.09.2013 | server1 | 99 |
09.09.2013 | server1 | 100 |
10.09.2013 | server1 | 103 |
11.09.2013 | server1 | 103 |
12.09.2013 | server1 | 100 |
04.09.2013 | server2 | 40 |
05.09.2013 | server2 | 41 |
06.09.2013 | server2 | 44 |
07.09.2013 | server2 | 39 |
08.09.2013 | server2 | 40 |
09.09.2013 | server2 | 40 |
10.09.2013 | server2 | 49 |
11.09.2013 | server2 | 41 |
12.09.2013 | server2 | 43 |
Now I want to find out the difference between the utilisation of each server per day in %. See this example:
Date | Server | Usage | % | |
04.09.2013 | server1 | 104 | ||
05.09.2013 | server1 | 120 | 15,38% | |
06.09.2013 | server1 | 114 | -5,00% | |
07.09.2013 | server1 | 100 | -12,28% | |
08.09.2013 | server1 | 99 | -1,00% | |
09.09.2013 | server1 | 100 | 1,01% | |
10.09.2013 | server1 | 103 | 3,00% | |
11.09.2013 | server1 | 103 | 0,00% | |
12.09.2013 | server1 | 100 | -2,91% | |
04.09.2013 | server2 | 40 | ||
05.09.2013 | server2 | 41 | 2,50% | |
06.09.2013 | server2 | 44 | 7,32% | |
07.09.2013 | server2 | 39 | -11,36% | |
08.09.2013 | server2 | 40 | 2,56% | |
09.09.2013 | server2 | 40 | 0,00% | |
10.09.2013 | server2 | 49 | 22,50% | |
11.09.2013 | server2 | 41 | -16,33% | |
12.09.2013 | server2 | 43 | 4,88% |
Is there any Procedure or something else that can help me with this question?
Background is, that I want to make an analysis about all the servers that have an increase/decrease in usage of more that 10% compared with the day before.
Thanks in advance, best regards
Christobal
I think you could use PROC EXPAND but I don't have the proper SAS bits that includes PROC EXPAND. It is easy to do with data step.
The first thing that comes to my mind is to sort by date/server, then use the data step with something like the lag function.
So for example, something like:
PROC SORT data=yourdataset;
BY ASCENDING date server; /* hope I've got that ascending syntax right */
run;
Data yourresults;
set yourdataset;
BY date server;
/* formula for a percentage as per your example...just format it into your required percentage format */
percentage = (usage/lag(usage))-1;
/* just a throwaway conditional to make the first observation missing rather than calculate the percentage change from the previous server when you're on the first observation of a new server */
/* I can't actually remember what happens when using the lag function on the first observation, so you might have to change all this, but it should be a good working framework solution */
if first.server then call missing(percentage);
run;
Then you can filter on those observations showing more than a 10% movement if you want to do further analysis...
Edit: There's a SAS global forum paper on the lag function that appears to be doing almost exactly what you want to do here.
Edit2: Forgot the BY statement, woops!
Consider using a SAS Hash Object. You can define your server list as a hash object. Read in the same list. For each server read from the SET, find the corresponding Hash record based on day-1 then calculate your percent difference and save the observation.
I think you could use PROC EXPAND but I don't have the proper SAS bits that includes PROC EXPAND. It is easy to do with data step.
You can use Lag function for that. proc sort data=cpu; by server date; run; data test (drop=pc); set cpu; by server date; u_lag=lag(usage); if first.server then u_lag = .; if u_lag ne . and usage ne . then pct=((usage/u_lag)-1)*100; if pct ne . then pc=abs(pct); if pc ge 10; run;
If there is no skipping in dates, the following Proc sQL seems to work:
data have;
input Date :ddmmyy10. Server$ Usage;
format date ddmmyy10.;
cards;
04.09.2013 server1 104
05.09.2013 server1 120
06.09.2013 server1 114
07.09.2013 server1 100
08.09.2013 server1 99
09.09.2013 server1 100
10.09.2013 server1 103
11.09.2013 server1 103
12.09.2013 server1 100
04.09.2013 server2 40
05.09.2013 server2 41
06.09.2013 server2 44
07.09.2013 server2 39
08.09.2013 server2 40
09.09.2013 server2 40
10.09.2013 server2 49
11.09.2013 server2 41
12.09.2013 server2 43
;
/*no skipping in dates*/
proc sql;
create table _10percent as
select a.*, (a.usage-b.usage)/b.usage as Percent format=percent10.2 from have a
left join have b
on a.server=b.server
and a.date-b.date=1
/* and abs(calculated percent) >0.1*/ /*add this line if only want those larger than 10%*/
;
quit;
If there is skipping in dates, and you want to compare between rows, then you would need an additional step or monotonic() to put in an index for your data before using Proc SQL.
Hello all,
thanks a lot for your answers and sorry for not answering earlier. I will try out your suggestions and give a feedback then.
Best regards
Christobal
Ok, now I tried this solution:
data cpu2;
set cpu;
by server date;
dif = dif(usage);
lag = lag(usage);
if first.server then call missing(dif,lag);
pchg = dif/lag;
format pchg percentn10.2;
run;
This is exactly what I was looking for. Thanks a lot again to all of you for your answers!!!!!!!!!!
Best regards
Christbal
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.