- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am creating summary analysis for records with a particular event denoted as 'X1'. I want create one data set with records that have consecutive X1 events as well as a second data set with records that have a X1 occurrence but in non-consecutive instances which would serve as a total record set that captures all X1 events. I created three counter fields: the first for the record iteration, the second counter starts and stops with each X1 event and the third counter counts each X1 event but reverts to zero when no X1 event is present. When I attempt to group the consecutive and non-consecutive events separately, the output makes no sense since the totals for the first instance of X1 (which should be the same whether they are consecutive or non-consecutive) are different between the two data sets. Here is the code for the three counter fields:
/*Creating a counter field to track the number of records for each individual account*/
data Step2;
set Step1;
by accountnumber;
if first.accountnumber
then acct_int = 1;
else acct_int + 1; /*iterative statement*/
run;
/*Creating a second counter field that starts and stops with each X1 event - Consecutive*/
data Step3 (drop=_n:);
set Step2;
by accountnumber;
if first.accountnumber then _n=0;
_n+event_type='X1';
event_int_1=ifn(event_type='X1',_n,0);
run;
/*Creating a third counter field that resets to zero for each non-consecutive X1*/
data Step4 (rename=(_n=event_int_2));
set Step3;
by accountnumber;
if first.accountnumber then _n=0;
if event_type='Non-X1' then _n=0;
else _n +1;
run;
The output will have this layout:
accountnumber event_type acct_int event_int_1 event_int_2
001 Non-X1 1 0 0
001 X1 2 1 1
001 Non-X1 3 1 0
001 X1 4 2 1
001 X1 5 3 2
002 X1 1 1 1
002 X1 2 2 2
002 Non-X1 3 2 0
002 X1 4 3 1
003 Non-X1 1 0 0
004 X1 1 1 1
005 X1 1 1 1
005 Non-X1 2 1 0
006 Non-X1 1 0 0
006 Non-X1 2 0 0
006 X1 3 1 1
When I use the GROUP BY statement using the event_1 and event_2 variables in PROC SQL, the totals don't make sense. Am I properly identifying the consecutive and non-consecutive records with my counters? Any assistance would be greatly appreciated in this issue. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can do this simply in a single data step, using what sas calls SUM statements, where the statement
x+y;
means
- Sum the values of X and Y, and store the result in X. (i.e. X=sum(x,y));
- Retain the resulting value for the next iteration of the data step.
For example, (untested):
data want;
set have;
by accountnumber ;
if first.accountnumber then call missing (acct_int,event_int_1,event_int_2);
acct_int+1;
event_int_1+(event_type='X');
if event_type^='X' then event_int_2=0;
else event_int_2+1;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I rewrote my code and created a data step to include a sum statement:
DATA master;
INPUT accountnumber event_type $ acct_int event_int_1 event_int_2 ;
DATALINES ;
1001 Non-X1 1 0 0
1001 X1 2 1 1
1001 Non-X1 3 1 0
1001 X1 4 2 1
1001 X1 5 3 2
1002 X1 1 1 1
1002 X1 2 2 2
1002 Non-X1 3 2 0
1002 X1 4 3 1
1003 Non-X1 1 0 0
1004 X1 1 1 1
1005 X1 1 1 1
1005 Non-X1 2 1 0
1006 Non-X1 1 0 0
1006 Non-X1 2 0 0
1006 X1 3 1 1; RUN;
data step1;
set master;
X=sum(event_int_1,event_int_2);
run;
data step2;
set step1;
by accountnumber ;
if first.accountnumber then call missing (acct_int,event_int_1,event_int_2);
acct_int+1;
event_int_1+(event_type='X');
if event_type^='X' then event_int_2=0;
else event_int_2+1;
run;
Unfortunately, the 2 value for the 2nd iteration in the acct_int field has disappeared when using the acct_int+1 statement. I don't want that field value to change. Here is my output when producing the step2 data set:
accountnumber event_type acct_int event_int_1 event_int_2 X
1001 Non-X1 1 0 0 0
1001 X1 3 1 0 2
1001 Non-X1 4 1 0 0
1001 X1 5 2 0 3
1001 X1 6 3 0 5
1002 X1 1 0 0 2
1002 X1 3 2 0 4
1002 Non-X1 4 2 0 2
1002 X1 5 3 0 4
1003 Non-X1 1 0 0 0
1004 X1 1 0 0 2
1005 X1 1 0 0 2
1005 Non-X1 3 1 0 1
1006 Non-X1 1 0 0 0
1006 Non-X1 3 0 0 0
1006 X1 4 1 0 2
Let me know what I'm doing wrong. Thanks.