Hello,
This is my first time posting but I'm hoping you can help with a problem I'm having.
I have a SAS program where I'm trying to do the following:
Code I'm using to assign values to the RUNIT and LIFEINT values is:
SELECT;
WHEN (RUNTIME LE 0.25) RUNINT='INT1';
WHEN (RUNTIME GT 0.25 AND RUNTIME LE 0.50) RUNINT='INT2';
WHEN (RUNTIME GT 0.50 AND RUNTIME LE 0.50) RUNINT='INT3';
WHEN (RUNTIME GT 0.75 AND RUNTIME LE 0.75) RUNINT='INT4';
WHEN (RUNTIME GT 1.00 AND RUNTIME LE 1.00) RUNINT='INT5';
WHEN (RUNTIME GT 1.25 AND RUNTIME LE 1.25) RUNINT='INT6';
WHEN (RUNTIME GT 1.50 AND RUNTIME LE 1.50) RUNINT='INT7';
WHEN (RUNTIME GT 1.75 AND RUNTIME LE 1.75) RUNINT='INT8';
WHEN (RUNTIME GT 2.00 AND RUNTIME LE 2.00) RUNINT='INT9';
WHEN (RUNTIME GT 2.25 AND RUNTIME LE 2.25) RUNINT='INT10';
WHEN (RUNTIME GT 2.50 AND RUNTIME LE 2.50) RUNINT='INT11';
WHEN (RUNTIME GT 2.75 AND RUNTIME LE 2.75) RUNINT='INT12';
WHEN (RUNTIME GT 3.00 AND RUNTIME LE 3.00) RUNINT='INT13';
WHEN (RUNTIME GT 4.00 AND RUNTIME LE 4.00) RUNINT='INT14';
WHEN (RUNTIME GT 5.00 AND RUNTIME LE 5.00) RUNINT='INT15';
WHEN (RUNTIME GT 6.00 AND RUNTIME LE 6.00) RUNINT='INT16';
OTHERWISE RUNINT='INT17';
END;
SELECT;
WHEN (ELAPSTM LE 0.25) LIFEINT='INT1';
WHEN (ELAPSTM GT 0.25 AND ELAPSTM LE 0.50) LIFEINT='INT2';
WHEN (ELAPSTM GT 0.50 AND ELAPSTM LE 0.75) LIFEINT='INT3';
WHEN (ELAPSTM GT 0.75 AND ELAPSTM LE 1.00) LIFEINT='INT4';
WHEN (ELAPSTM GT 1.00 AND ELAPSTM LE 1.25) LIFEINT='INT5';
WHEN (ELAPSTM GT 1.25 AND ELAPSTM LE 1.50) LIFEINT='INT6';
WHEN (ELAPSTM GT 1.50 AND ELAPSTM LE 1.75) LIFEINT='INT7';
WHEN (ELAPSTM GT 1.75 AND ELAPSTM LE 2.00) LIFEINT='INT8';
WHEN (ELAPSTM GT 2.00 AND ELAPSTM LE 2.25) LIFEINT='INT9';
WHEN (ELAPSTM GT 2.25 AND ELAPSTM LE 2.50) LIFEINT='INT10';
WHEN (ELAPSTM GT 2.50 AND ELAPSTM LE 2.75) LIFEINT='INT11';
WHEN (ELAPSTM GT 2.75 AND ELAPSTM LE 3.00) LIFEINT='INT12';
WHEN (ELAPSTM GT 3.00 AND ELAPSTM LE 4.00) LIFEINT='INT13';
WHEN (ELAPSTM GT 4.00 AND ELAPSTM LE 5.00) LIFEINT='INT14';
WHEN (ELAPSTM GT 5.00 AND ELAPSTM LE 6.00) LIFEINT='INT15';
WHEN (ELAPSTM GT 6.00 AND ELAPSTM LE 7.00) LIFEINT='INT16';
OTHERWISE LIFEINT='INT17';
END;
Unfortunately this isn't having the desired results. Things are ending up in RUNINT=INT1 and LIFEINT=INT1 when they should not be. Below is a sample of what I'm getting:
TRANNAME ELAPSTM TCLDIOTM LIFEINT RUNTIME RUNINT
EWEB 0:00:36.02 0:00:00.00 INT1 36.0164 INT1
CKOM 0:01:15.87 0:00:44.43 INT1 31.4486 INT1
CKOM 0:00:30.80 0:00:00.00 INT1 30.7976 INT1
OMSE 0:00:29.05 0:00:00.00 INT1 29.0541 INT1
OMSE 0:00:26.54 0:00:00.00 INT1 26.5420 INT1
OMSE 0:00:26.50 0:00:00.00 INT1 26.5014 INT1
EWEB 0:00:26.45 0:00:00.00 INT1 26.4546 INT1
I and several people have been looking at this and trying different formats of the comparison for a few weeks but we haven't been able to figure this out. I'm hoping someone will see what I'm missing and what's causing these to show up in INT1 for both interval variables.
The goal of doing this is to do other processing based on the 2 interval variables include getting counts by interval. I open to whatever will allow us to do that.
I'm attaching a copy of the program as well as a partial report.
Thanks in advance for any advice.
Donna Domovic
I suspect that lifeint and runint are created with only 4 characters long, so that when they are assigned the "INT17" default value they are chopped to "INT1". Try adding the statement
length lifeint runint $5;
to your code.
Sorry. Looks like the files didn't get attached. Here they are.
I suspect that lifeint and runint are created with only 4 characters long, so that when they are assigned the "INT17" default value they are chopped to "INT1". Try adding the statement
length lifeint runint $5;
to your code.
Hello @DonnaDomovic (and welcome to the SAS Support Communities :-)),
In addition to the truncation issue pointed out by PGStats, most of the WHEN conditions for RUNTIME are of the (contradictory) form
RUNTIME GT x AND RUNTIME LE x
and hence will never be met.
Assuming that the conditions for RUNTIME should actually be the same as for ELAPSTM, it might be useful to define a format:
proc format;
value timeint
0 - 0.25 = 'INT1'
0.25< - 0.50 = 'INT2'
0.50< - 0.75 = 'INT3'
...
6.00< - 7.00 = 'INT16'
other = 'INT17';
run;
(Note that this would label missing or negative values as "INT17" unlike your SELECT statements, which would assign the value "INT1".)
You can then define the two new variables based on the same format:
runint = put(runtime, timeint.);
lifeint = put(elapstm, timeint.);
Or maybe you don't even need the new variables anymore because you can just use RUNTIME and ELAPSTM, formatted with timeint. (e.g. in PROC REPORT using aliases).
Any future changes to the conditions or format labels would be applied to the single format definition rather than two blocks of SELECT/WHEN statements, thus ensuring consistency.
Thanks to both of you. The first option to add the length resolved the issue but I actually added the PROC FORMAT and that helped make the program much cleaner. I'm still going to look at eliminating the RUNINT and LIFEINT as was suggested.
In case you're interested, I'm attaching a copy of the updated code along with a sample of the results. So that you know, I did change the ranges to better reflect the data. The new ranges are:
PROC FORMAT;
VALUE TIMEINT
0 - 0.25 = 'INT1'
0.25< - 0.50 = 'INT2'
0.50< - 0.75 = 'INT3'
0.75< - 1.00 = 'INT4'
1.00< - 1.25 = 'INT5'
1.25< - 1.50 = 'INT6'
1.50< - 2.00 = 'INT7'
2.00< - 3.00 = 'INT8'
3.00< - 5.00 = 'INT9'
5.00< - 10.00 = 'INT10'
10.00< - 20.00 = 'INT11'
20.00< - 30.00 = 'INT12'
30.00< - 60.00 = 'INT13'
60.00< - 120.00 = 'INT14'
120.00< - 180.00 = 'INT15'
OTHER = 'INT16';
RUN;
You're welcome. Good to see the improvements.
For a report including both the original time values and the categories "INTn" it's probably convenient to have the additional character variables (LIFEINT and RUNINT), although with a bit more coding effort PROC REPORT (unlike PROC PRINT) could produce those columns using only ELAPSTM, RUNTIME and format timeint.
There are other types of output, however, where additional character variables were really unnecessary. For example, the frequency distribution produced by
proc freq data=temp;
tables runint;
run;
could be obtained easily with
proc freq data=temp;
format runtime timeint.;
tables runtime;
run;
as well -- in fact even better because now (thanks to the default ORDER=INTERNAL) the categories would be sorted by ascending values of RUNTIME, not alphabetically (INT1, INT10, ..., INT16, INT2, ...).
The ultimate goal is to create a report of the count of transactions by interval broken down by phase of a test (3 phases) and transaction. The report would look something like the following (I'm only showing 2 of 4 transactions with different intervals)
IIntinSec CKOM P1 Cnt CKOM P1 1 % CKOM P2 Cnt CKOM P2 % CKOM P3 Cnt CKOM P3 % ECOM P1 Cnt ECOM P1 % ECOM P2 Cnt ECOM P2 % ECOM P3 Cnt ECOM P3 %
0.00 to 0.25 2,519 41% 5,669 57% 6,248 44% 48,216 81% 48,191 75% 37,104 72%
0.26 to 0.50 3,048 49% 2,879 29% 5,779 41% 7,829 13% 11,377 18% 7,081 14%
0.51 to 0.75 261 4% 341 3% 669 5% 881 1% 1,698 3% 1,963 4%
...
1.76 to 2.00 35 1% 163 2% 167 1% 96 0% 241 0% 373 1%
2.01 to 2.25 27 0% 154 2% 144 1% 107 0% 227 0% 287 1%
Total 6,164 100% 9,947 100% 14,052 100% 59,608 100% 64,633 100% 51,213 100%
I was able to get what I want (minus the percentages by creating a number of variables and basically counting doing it once for the RUNINT variable and another for the LIFEINT variable. Below is the code for the RUNINT variable:
PROC SORT DATA=TEMP1; BY SYSTEM RUNINT TRANNAME PHASE;
DATA RUNCNTS (KEEP = SYSTEM RUNINT EWEBP1R EWEBP2R EWEBP3R ECOMP1R
ECOMP2R ECOMP3R OMSEP1R OMSEP2R OMSEP3R CKOMP1R
CKOMP2R CKOMP3R);
SET TEMP1; BY SYSTEM RUNINT TRANNAME PHASE;
RETAIN CKOMP1R;
RETAIN CKOMP2R;
RETAIN CKOMP3R;
RETAIN ECOMP1R;
RETAIN ECOMP2R;
RETAIN ECOMP3R;
RETAIN EWEBP1R;
RETAIN EWEBP2R;
RETAIN EWEBP3R;
RETAIN OMSEP1R;
RETAIN OMSEP2R;
RETAIN OMSEP3R;
IF FIRST.RUNINT THEN DO;
CKOMP1R=0;
CKOMP2R=0;
CKOMP3R=0;
ECOMP1R=0;
ECOMP2R=0;
ECOMP3R=0;
EWEBP1R=0;
EWEBP2R=0;
EWEBP3R=0;
OMSEP1R=0;
OMSEP2R=0;
OMSEP3R=0;
END;
SELECT;
WHEN (TRANNAME='CKOM' AND PHASE = 'P1') CKOMP1R=CKOMP1R+1;
WHEN (TRANNAME='CKOM' AND PHASE = 'P2') CKOMP2R=CKOMP2R+1;
WHEN (TRANNAME='CKOM' AND PHASE = 'P3') CKOMP3R=CKOMP3R+1;
WHEN (TRANNAME='ECOM' AND PHASE = 'P1') ECOMP1R=ECOMP1R+1;
WHEN (TRANNAME='ECOM' AND PHASE = 'P2') ECOMP2R=ECOMP2R+1;
WHEN (TRANNAME='ECOM' AND PHASE = 'P3') ECOMP3R=ECOMP3R+1;
WHEN (TRANNAME='EWEB' AND PHASE = 'P1') EWEBP1R=EWEBP1R+1;
WHEN (TRANNAME='EWEB' AND PHASE = 'P2') EWEBP2R=EWEBP2R+1;
WHEN (TRANNAME='EWEB' AND PHASE = 'P3') EWEBP3R=EWEBP3R+1;
WHEN (TRANNAME='OMSE' AND PHASE = 'P1') OMSEP1R=OMSEP1R+1;
WHEN (TRANNAME='OMSE' AND PHASE = 'P2') OMSEP2R=OMSEP2R+1;
WHEN (TRANNAME='OMSE' AND PHASE = 'P3') OMSEP3R=OMSEP3R+1;
END;
IF LAST.RUNINT THEN OUTPUT;
This resulted in the following:
RUNINT CKOMP1R CKOMP2R CKOMP3R ECOMP1R ECOMP2R ECOMP3R EWEBP1R EWEBP2R EWEBP3R OMSEP1R OMSEP2R OMSEP3R
INT01 698 3137 8168 69084 55368 60915 110779 96143 83831 242 504 147
INT02 1514 5061 12693 18664 11366 17821 10114 4444 4196 145 187 37
INT03 447 600 1523 2582 1056 4333 2275 1070 1391 54 61 25
INT04 234 244 669 868 556 2524 1230 616 636 29 40 8
INT05 178 141 384 974 651 1091 902 480 476 74 703 274
INT06 148 145 263 440 264 550 757 362 355 34 102 28
INT07 225 153 377 690 214 475 1220 488 596 38 44 7
INT08 368 208 468 852 418 578 1752 909 773 70 39 3
INT09 576 236 320 1617 488 560 2847 920 662 114 44 2
INT10 960 144 87 1984 374 139 3352 752 309 175 26 0
INT11 589 0 0 937 215 0 1716 301 0 124 0 0
INT12 165 0 0 242 38 7 277 68 0 36 0 0
INT13 48 0 0 213 23 21 94 0 0 24 0 0
INT14 1 0 0 0 0 0 1 0 0 0 0 0
For now, I can put this in Excel and add the percentages but I'll continue playing with the code and will look at the PROC SQL or PROC FREQ to see if it helps streamline this. This code will be reused for future tests which occur a few times a year so having it create the complete report with as little manipulation as possible will help.
Besides, playing with SAS code and learning new options is fun. 🙂2
For those interested, I'm attaching the latest version of the code and report.
My first approach for this would probably center around a PROC FREQ step producing the counts and percentages of the cross tabulation. This would be preceded by the preparation of the input dataset and followed by post-processing of the output dataset (i.e. ODS table CrossTabFreqs) in order to create the final report with PROC REPORT. Other programmers might consider PROC TABULATE or other procedures.
Your "manual" data step approach would still be useful for validation purposes. It could be simplified by using the sum statement (e.g., WHEN(...) CKOMP1R+1;), which would render the RETAIN statements redundant.
Thanks once again. I changed the RETAIN to the sum as you suggested and it worked wonderfully. I'll take a look at the PROC FREQ later or this week and see if that will help simplify the program as well. If nothing else, I'd like to try it so I have an idea of how to use it.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.