Solved: Re: Looking for a Procedure that helps me to compare data

Christobal · Posted 09-12-2013 08:22 AM

Hello All,

I have the following question: I have a table that contains the utilisation of a server per day. This table contains data for several servers:

Date	Server	Usage
04.09.2013	server1	104
05.09.2013	server1	120
06.09.2013	server1	114
07.09.2013	server1	100
08.09.2013	server1	99
09.09.2013	server1	100
10.09.2013	server1	103
11.09.2013	server1	103
12.09.2013	server1	100
04.09.2013	server2	40
05.09.2013	server2	41
06.09.2013	server2	44
07.09.2013	server2	39
08.09.2013	server2	40
09.09.2013	server2	40
10.09.2013	server2	49
11.09.2013	server2	41
12.09.2013	server2	43

Now I want to find out the difference between the utilisation of each server per day in %. See this example:

Date	Server	Usage	%
04.09.2013	server1	104
05.09.2013	server1	120	15,38%
06.09.2013	server1	114	-5,00%
07.09.2013	server1	100	-12,28%
08.09.2013	server1	99	-1,00%
09.09.2013	server1	100	1,01%
10.09.2013	server1	103	3,00%
11.09.2013	server1	103	0,00%
12.09.2013	server1	100	-2,91%
04.09.2013	server2	40
05.09.2013	server2	41	2,50%
06.09.2013	server2	44	7,32%
07.09.2013	server2	39	-11,36%
08.09.2013	server2	40	2,56%
09.09.2013	server2	40	0,00%
10.09.2013	server2	49	22,50%
11.09.2013	server2	41	-16,33%
12.09.2013	server2	43	4,88%

Is there any Procedure or something else that can help me with this question?

Background is, that I want to make an analysis about all the servers that have an increase/decrease in usage of more that 10% compared with the day before.

Thanks in advance, best regards

Christobal

data_null__ · Posted 09-12-2013 11:33 AM

I think you could use PROC EXPAND but I don't have the proper SAS bits that includes PROC EXPAND. It is easy to do with data step.

data cpu;
   input Date:ddmmyy10. Server $ Usage;
   format date ddmmyyp10.; 
   cards; 
04.09.2013 server1 104
05.09.2013 server1 120
06.09.2013 server1 114
07.09.2013 server1 100
08.09.2013 server1 99
09.09.2013 server1 100
10.09.2013 server1 103
11.09.2013 server1 103
12.09.2013 server1 100
04.09.2013 server2 40
05.09.2013 server2 41
06.09.2013 server2 44
07.09.2013 server2 39
08.09.2013 server2 40
09.09.2013 server2 40
10.09.2013 server2 49
11.09.2013 server2 41
12.09.2013 server2 43
;;;;
   run; 
proc print; 
   run; 
data cpu2;
   set cpu;
   by server date;
   dif = dif(usage);
   lag = lag(usage);
   if first.server then call missing(dif,lag);
   pchg = dif/lag;
   format pchg percentn10.2; 
   run; 
proc print; 
   run;

View solution in original post

DJM_SAS · Posted 09-12-2013 08:57 AM

The first thing that comes to my mind is to sort by date/server, then use the data step with something like the lag function.

So for example, something like:

PROC SORT data=yourdataset;

BY ASCENDING date server; /* hope I've got that ascending syntax right */

run;

Data yourresults;

set yourdataset;

BY date server;

/* formula for a percentage as per your example...just format it into your required percentage format */

percentage = (usage/lag(usage))-1;

/* just a throwaway conditional to make the first observation missing rather than calculate the percentage change from the previous server when you're on the first observation of a new server */

/* I can't actually remember what happens when using the lag function on the first observation, so you might have to change all this, but it should be a good working framework solution */

if first.server then call missing(percentage);

run;

Then you can filter on those observations showing more than a 10% movement if you want to do further analysis...

Edit: There's a SAS global forum paper on the lag function that appears to be doing almost exactly what you want to do here.

Edit2: Forgot the BY statement, woops!

DanK · Posted 09-12-2013 10:12 AM

Consider using a SAS Hash Object. You can define your server list as a hash object. Read in the same list. For each server read from the SET, find the corresponding Hash record based on day-1 then calculate your percent difference and save the observation.

data_null__ · Posted 09-12-2013 11:33 AM

I think you could use PROC EXPAND but I don't have the proper SAS bits that includes PROC EXPAND. It is easy to do with data step.

data cpu;
   input Date:ddmmyy10. Server $ Usage;
   format date ddmmyyp10.; 
   cards; 
04.09.2013 server1 104
05.09.2013 server1 120
06.09.2013 server1 114
07.09.2013 server1 100
08.09.2013 server1 99
09.09.2013 server1 100
10.09.2013 server1 103
11.09.2013 server1 103
12.09.2013 server1 100
04.09.2013 server2 40
05.09.2013 server2 41
06.09.2013 server2 44
07.09.2013 server2 39
08.09.2013 server2 40
09.09.2013 server2 40
10.09.2013 server2 49
11.09.2013 server2 41
12.09.2013 server2 43
;;;;
   run; 
proc print; 
   run; 
data cpu2;
   set cpu;
   by server date;
   dif = dif(usage);
   lag = lag(usage);
   if first.server then call missing(dif,lag);
   pchg = dif/lag;
   format pchg percentn10.2; 
   run; 
proc print; 
   run;

P_Sharma · Posted 09-12-2013 12:08 PM

You can use Lag function for that. proc sort data=cpu; by server date; run; data test (drop=pc); set cpu; by server date; u_lag=lag(usage); if first.server then u_lag = .; if u_lag ne . and usage ne . then pct=((usage/u_lag)-1)*100; if pct ne . then pc=abs(pct); if pc ge 10; run;

Haikuo · Posted 09-12-2013 12:37 PM

If there is no skipping in dates, the following Proc sQL seems to work:

data have;

input Date :ddmmyy10. Server$ Usage;

format date ddmmyy10.;

cards;

04.09.2013 server1 104

05.09.2013 server1 120

06.09.2013 server1 114

07.09.2013 server1 100

08.09.2013 server1 99

09.09.2013 server1 100

10.09.2013 server1 103

11.09.2013 server1 103

12.09.2013 server1 100

04.09.2013 server2 40

05.09.2013 server2 41

06.09.2013 server2 44

07.09.2013 server2 39

08.09.2013 server2 40

09.09.2013 server2 40

10.09.2013 server2 49

11.09.2013 server2 41

12.09.2013 server2 43

;

/*no skipping in dates*/

proc sql;

create table _10percent as

select a.*, (a.usage-b.usage)/b.usage as Percent format=percent10.2 from have a

left join have b

on a.server=b.server

and a.date-b.date=1

/* and abs(calculated percent) >0.1*/ /*add this line if only want those larger than 10%*/

;

quit;

If there is skipping in dates, and you want to compare between rows, then you would need an additional step or monotonic() to put in an index for your data before using Proc SQL.

Christobal · Posted 09-13-2013 12:42 AM

Hello all,

thanks a lot for your answers and sorry for not answering earlier. I will try out your suggestions and give a feedback then.

Best regards

Christobal

Christobal · Posted 09-13-2013 01:12 AM

Ok, now I tried this solution:

data cpu2;
   set cpu;
   by server date;
   dif = dif(usage);
   lag = lag(usage);
   if first.server then call missing(dif,lag);
   pchg = dif/lag;
   format pchg percentn10.2;
   run;

This is exactly what I was looking for. Thanks a lot again to all of you for your answers!!!!!!!!!!

Best regards

Christbal

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away