BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DonnaDomovic
Calcite | Level 5

Hello,

This is my first time posting but I'm hoping you can help with a problem I'm having.

 

I have a SAS program where I'm trying to do the following:

  • Reading a file of mainframe SMF data that has durations stored in TIME12.2 format for each transaction.  These fields are ELAPSTM and CLDIOTM.
  • Calculate the difference between ELAPSTM and CLDIODM to create a variable called RUNTIME to store the duration between those 2 times.
  • Assign each record to an interval based on the RUNTIME (RUNINT) and ELAPSTM (LIFEINT).  For example, if the calculated RUNTIME is less than 0.25 seconds, RUNINT is set to "INT1".  If RUNTIME is between 0.26 and 0.50 seconds RUNINT is set to "INT2", etc.  Similar for LIFEINT but that is based on the ELAPSTM variable value.

Code I'm using to assign values to the RUNIT and LIFEINT values is:

SELECT;
   WHEN (RUNTIME LE 0.25) RUNINT='INT1';
   WHEN (RUNTIME GT 0.25 AND RUNTIME LE 0.50) RUNINT='INT2';
   WHEN (RUNTIME GT 0.50 AND RUNTIME LE 0.50) RUNINT='INT3';
   WHEN (RUNTIME GT 0.75 AND RUNTIME LE 0.75) RUNINT='INT4';
   WHEN (RUNTIME GT 1.00 AND RUNTIME LE 1.00) RUNINT='INT5';
   WHEN (RUNTIME GT 1.25 AND RUNTIME LE 1.25) RUNINT='INT6';
   WHEN (RUNTIME GT 1.50 AND RUNTIME LE 1.50) RUNINT='INT7';
   WHEN (RUNTIME GT 1.75 AND RUNTIME LE 1.75) RUNINT='INT8';
   WHEN (RUNTIME GT 2.00 AND RUNTIME LE 2.00) RUNINT='INT9';
   WHEN (RUNTIME GT 2.25 AND RUNTIME LE 2.25) RUNINT='INT10';
   WHEN (RUNTIME GT 2.50 AND RUNTIME LE 2.50) RUNINT='INT11';
   WHEN (RUNTIME GT 2.75 AND RUNTIME LE 2.75) RUNINT='INT12';
   WHEN (RUNTIME GT 3.00 AND RUNTIME LE 3.00) RUNINT='INT13';
   WHEN (RUNTIME GT 4.00 AND RUNTIME LE 4.00) RUNINT='INT14';
   WHEN (RUNTIME GT 5.00 AND RUNTIME LE 5.00) RUNINT='INT15';
   WHEN (RUNTIME GT 6.00 AND RUNTIME LE 6.00) RUNINT='INT16';
   OTHERWISE RUNINT='INT17';
END;

SELECT;
   WHEN (ELAPSTM LE 0.25) LIFEINT='INT1';
   WHEN (ELAPSTM GT 0.25 AND ELAPSTM LE 0.50) LIFEINT='INT2';
   WHEN (ELAPSTM GT 0.50 AND ELAPSTM LE 0.75) LIFEINT='INT3';
   WHEN (ELAPSTM GT 0.75 AND ELAPSTM LE 1.00) LIFEINT='INT4';
   WHEN (ELAPSTM GT 1.00 AND ELAPSTM LE 1.25) LIFEINT='INT5';
   WHEN (ELAPSTM GT 1.25 AND ELAPSTM LE 1.50) LIFEINT='INT6';
   WHEN (ELAPSTM GT 1.50 AND ELAPSTM LE 1.75) LIFEINT='INT7';
   WHEN (ELAPSTM GT 1.75 AND ELAPSTM LE 2.00) LIFEINT='INT8';
   WHEN (ELAPSTM GT 2.00 AND ELAPSTM LE 2.25) LIFEINT='INT9';
   WHEN (ELAPSTM GT 2.25 AND ELAPSTM LE 2.50) LIFEINT='INT10';
   WHEN (ELAPSTM GT 2.50 AND ELAPSTM LE 2.75) LIFEINT='INT11';
   WHEN (ELAPSTM GT 2.75 AND ELAPSTM LE 3.00) LIFEINT='INT12';
   WHEN (ELAPSTM GT 3.00 AND ELAPSTM LE 4.00) LIFEINT='INT13';
   WHEN (ELAPSTM GT 4.00 AND ELAPSTM LE 5.00) LIFEINT='INT14';
   WHEN (ELAPSTM GT 5.00 AND ELAPSTM LE 6.00) LIFEINT='INT15';
   WHEN (ELAPSTM GT 6.00 AND ELAPSTM LE 7.00) LIFEINT='INT16';
   OTHERWISE LIFEINT='INT17';
END;

Unfortunately this isn't having the desired results.  Things are ending up in RUNINT=INT1 and LIFEINT=INT1 when they should not be.  Below is a sample of what I'm getting:

 

TRANNAME         ELAPSTM        TCLDIOTM    LIFEINT    RUNTIME    RUNINT

  EWEB        0:00:36.02      0:00:00.00     INT1      36.0164     INT1
  CKOM        0:01:15.87      0:00:44.43     INT1      31.4486     INT1
  CKOM        0:00:30.80      0:00:00.00     INT1      30.7976     INT1
  OMSE        0:00:29.05      0:00:00.00     INT1      29.0541     INT1
  OMSE        0:00:26.54      0:00:00.00     INT1      26.5420     INT1
  OMSE        0:00:26.50      0:00:00.00     INT1      26.5014     INT1
  EWEB        0:00:26.45      0:00:00.00     INT1      26.4546     INT1

I and several people have been looking at this and trying different formats of the comparison for a few weeks but we haven't been able to figure this out.  I'm hoping someone will see what I'm missing and what's causing these to show up in INT1 for both interval variables.

 

The goal of doing this is to do other processing based on the 2 interval variables include getting counts by interval.  I open to whatever will allow us to do that.

 

I'm attaching a copy of the program as well as a partial report. 

 

Thanks in advance for any advice.

 

Donna Domovic

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I suspect that lifeint and runint are created with only 4 characters long, so that when they are assigned the "INT17" default value they are chopped to "INT1". Try adding the statement

 

length lifeint runint $5;

 

to your code.

PG

View solution in original post

8 REPLIES 8
DonnaDomovic
Calcite | Level 5

Sorry.  Looks like the files didn't get attached.  Here they are.

PGStats
Opal | Level 21

I suspect that lifeint and runint are created with only 4 characters long, so that when they are assigned the "INT17" default value they are chopped to "INT1". Try adding the statement

 

length lifeint runint $5;

 

to your code.

PG
FreelanceReinh
Jade | Level 19

Hello @DonnaDomovic (and welcome to the SAS Support Communities :-)),

 

In addition to the truncation issue pointed out by PGStats, most of the WHEN conditions for RUNTIME are of the (contradictory) form

RUNTIME GT x AND RUNTIME LE x

and hence will never be met.

 

Assuming that the conditions for RUNTIME should actually be the same as for ELAPSTM, it might be useful to define a format:

proc format;
value timeint
0     - 0.25 = 'INT1'
0.25< - 0.50 = 'INT2'
0.50< - 0.75 = 'INT3'
...
6.00< - 7.00 = 'INT16'
other        = 'INT17';
run;

(Note that this would label missing or negative values as "INT17" unlike your SELECT statements, which would assign the value "INT1".)

 

You can then define the two new variables based on the same format:

runint  = put(runtime, timeint.);
lifeint = put(elapstm, timeint.);

Or maybe you don't even need the new variables anymore because you can just use RUNTIME and ELAPSTM, formatted with timeint. (e.g. in PROC REPORT using aliases).

 

Any future changes to the conditions or format labels would be applied to the single format definition rather than two blocks of SELECT/WHEN statements, thus ensuring consistency.

DonnaDomovic
Calcite | Level 5

Thanks to both of you.  The first option to add the length resolved the issue but I actually added the PROC FORMAT and that helped make the program much cleaner.  I'm still going to look at eliminating the RUNINT and LIFEINT as was suggested.

 

In case you're interested, I'm attaching a copy of the updated code along with a sample of the results.  So that you know, I did change the ranges to better reflect the data.  The new ranges are:

 

PROC FORMAT;
   VALUE TIMEINT
      0     - 0.25 = 'INT1'
      0.25< - 0.50 = 'INT2'
      0.50< - 0.75 = 'INT3'
      0.75< - 1.00 = 'INT4'
      1.00< - 1.25 = 'INT5'
      1.25< - 1.50 = 'INT6'
      1.50< - 2.00 = 'INT7'
      2.00< - 3.00 = 'INT8'
      3.00< - 5.00 = 'INT9'
      5.00< - 10.00 = 'INT10'
      10.00< - 20.00 = 'INT11'
      20.00< - 30.00 = 'INT12'
      30.00< - 60.00 = 'INT13'
      60.00< - 120.00 = 'INT14'
      120.00< - 180.00 = 'INT15'
      OTHER        = 'INT16';
RUN;
FreelanceReinh
Jade | Level 19

You're welcome. Good to see the improvements.

 

For a report including both the original time values and the categories "INTn" it's probably convenient to have the additional character variables (LIFEINT and RUNINT), although with a bit more coding effort PROC REPORT (unlike PROC PRINT) could produce those columns using only ELAPSTM, RUNTIME and format timeint.

 

There are other types of output, however, where additional character variables were really unnecessary. For example, the frequency distribution produced by

proc freq data=temp;
tables runint;
run;

could be obtained easily with

proc freq data=temp;
format runtime timeint.;
tables runtime;
run;

as well -- in fact even better because now (thanks to the default ORDER=INTERNAL) the categories would be sorted by ascending values of RUNTIME, not alphabetically (INT1, INT10, ..., INT16, INT2, ...).

DonnaDomovic
Calcite | Level 5

The ultimate goal is to create a report of the count of transactions by interval broken down by phase of a test (3 phases) and transaction.  The report would look something like the following (I'm only showing 2 of 4 transactions with different intervals) 

 

IIntinSec	  CKOM P1 Cnt 	CKOM P1 1 %	 CKOM P2 Cnt  CKOM P2 %	 CKOM P3 Cnt  CKOM P3 %	 ECOM P1 Cnt  ECOM P1 %	 ECOM P2 Cnt  ECOM P2 %	 ECOM P3 Cnt  ECOM P3 %
0.00 to 0.25	 2,519 	      41%	       5,669 	    57%	      6,248 	     44%	   48,216 	    81%	      48,191 	   75%	       37,104 	    72%
0.26 to 0.50	 3,048 	      49%	       2,879 	    29%	      5,779 	     41%	    7,829 	    13%	      11,377 	   18%	        7,081 	    14%
0.51 to 0.75	 261 	       4%	       341 	         3%	      669 	          5%	      881 	     1%	       1,698 	    3%	        1,963 	     4%
...
1.76 to 2.00	 35 	       1%	       163 	         2%	      167 	          1%	       96 	     0%	         241 	    0%	         373 	     1%
2.01 to 2.25	 27 	       0%	       154 	         2%	      144 	          1%	      107 	     0%	         227 	    0%	         287 	     1%
Total	         6,164 	     100%	       9,947 	   100%	      14,052 	    100%	   59,608 	   100%	      64,633 	  100%	      51,213 	   100%

 I was able to get what I want (minus the percentages by creating a number of variables and basically counting doing it once for the RUNINT variable and another for the LIFEINT variable.  Below is the code for the RUNINT variable:

 

PROC SORT DATA=TEMP1; BY SYSTEM RUNINT TRANNAME PHASE;
DATA RUNCNTS (KEEP = SYSTEM RUNINT EWEBP1R EWEBP2R EWEBP3R ECOMP1R
                     ECOMP2R ECOMP3R OMSEP1R OMSEP2R OMSEP3R CKOMP1R
                     CKOMP2R CKOMP3R);
SET TEMP1; BY SYSTEM RUNINT TRANNAME PHASE;
RETAIN CKOMP1R;
RETAIN CKOMP2R;
RETAIN CKOMP3R;
RETAIN ECOMP1R;
RETAIN ECOMP2R;
RETAIN ECOMP3R;
RETAIN EWEBP1R;
RETAIN EWEBP2R;
RETAIN EWEBP3R;
RETAIN OMSEP1R;
RETAIN OMSEP2R;
RETAIN OMSEP3R;
IF FIRST.RUNINT THEN DO;
   CKOMP1R=0;
   CKOMP2R=0;
   CKOMP3R=0;
   ECOMP1R=0;
   ECOMP2R=0;
   ECOMP3R=0;
   EWEBP1R=0;
   EWEBP2R=0;
   EWEBP3R=0;
   OMSEP1R=0;
   OMSEP2R=0;
   OMSEP3R=0;
END;
SELECT;
   WHEN (TRANNAME='CKOM' AND PHASE = 'P1') CKOMP1R=CKOMP1R+1;
   WHEN (TRANNAME='CKOM' AND PHASE = 'P2') CKOMP2R=CKOMP2R+1;
   WHEN (TRANNAME='CKOM' AND PHASE = 'P3') CKOMP3R=CKOMP3R+1;
   WHEN (TRANNAME='ECOM' AND PHASE = 'P1') ECOMP1R=ECOMP1R+1;
   WHEN (TRANNAME='ECOM' AND PHASE = 'P2') ECOMP2R=ECOMP2R+1;
   WHEN (TRANNAME='ECOM' AND PHASE = 'P3') ECOMP3R=ECOMP3R+1;
   WHEN (TRANNAME='EWEB' AND PHASE = 'P1') EWEBP1R=EWEBP1R+1;
   WHEN (TRANNAME='EWEB' AND PHASE = 'P2') EWEBP2R=EWEBP2R+1;
   WHEN (TRANNAME='EWEB' AND PHASE = 'P3') EWEBP3R=EWEBP3R+1;
   WHEN (TRANNAME='OMSE' AND PHASE = 'P1') OMSEP1R=OMSEP1R+1;
   WHEN (TRANNAME='OMSE' AND PHASE = 'P2') OMSEP2R=OMSEP2R+1;
   WHEN (TRANNAME='OMSE' AND PHASE = 'P3') OMSEP3R=OMSEP3R+1;
END;
IF LAST.RUNINT THEN OUTPUT;

This resulted in the following:

 

RUNINT  CKOMP1R  CKOMP2R  CKOMP3R  ECOMP1R  ECOMP2R  ECOMP3R  EWEBP1R  EWEBP2R  EWEBP3R  OMSEP1R  OMSEP2R  OMSEP3R

INT01      698     3137     8168    69084    55368    60915    110779    96143    83831    242      504      147
INT02     1514     5061    12693    18664    11366    17821     10114     4444     4196    145      187       37
INT03      447      600     1523     2582     1056     4333      2275     1070     1391     54       61       25
INT04      234      244      669      868      556     2524      1230      616      636     29       40        8
INT05      178      141      384      974      651     1091       902      480      476     74      703      274
INT06      148      145      263      440      264      550       757      362      355     34      102       28
INT07      225      153      377      690      214      475      1220      488      596     38       44        7
INT08      368      208      468      852      418      578      1752      909      773     70       39        3
INT09      576      236      320     1617      488      560      2847      920      662    114       44        2
INT10      960      144       87     1984      374      139      3352      752      309    175       26        0
INT11      589        0        0      937      215        0      1716      301        0    124        0        0
INT12      165        0        0      242       38        7       277       68        0     36        0        0
INT13       48        0        0      213       23       21        94        0        0     24        0        0
INT14        1        0        0        0        0        0         1        0        0      0        0        0

For now, I can put this in Excel and add the percentages but I'll continue playing with the code and will look at the PROC SQL or PROC FREQ to see if it helps streamline this.  This code will be reused for future tests which occur a few times a year so having it create the complete report with as little manipulation as possible will help.

 

Besides, playing with SAS code and learning new options is fun.  🙂2

 

For those interested, I'm attaching the latest version of the code and report.

FreelanceReinh
Jade | Level 19

My first approach for this would probably center around a PROC FREQ step producing the counts and percentages of the cross tabulation. This would be preceded by the preparation of the input dataset and followed by post-processing of the output dataset (i.e. ODS table CrossTabFreqs) in order to create the final report with PROC REPORT. Other programmers might consider PROC TABULATE or other procedures.

 

Your "manual" data step approach would still be useful for validation purposes. It could be simplified by using the sum statement (e.g., WHEN(...) CKOMP1R+1;), which would render the RETAIN statements redundant.

DonnaDomovic
Calcite | Level 5

Thanks once again.  I changed the RETAIN to the sum as you suggested and it worked wonderfully.  I'll take a look at the PROC FREQ later or this week and see if that will help simplify the program as well. If nothing else, I'd like to try it so I have an idea of how to use it.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 917 views
  • 1 like
  • 3 in conversation