SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
greg6363
Obsidian | Level 7

I am creating summary analysis for records with a particular event denoted as 'X1'.  I want create one data set with records that have consecutive X1 events as well as a second data set with records that have a X1 occurrence but in non-consecutive instances which would serve as a total record set that captures all X1 events.  I created three counter fields: the first for the record iteration, the second counter starts and stops with each X1 event and the third counter counts each X1 event but reverts to zero when no X1 event is present.  When I attempt to group the consecutive and non-consecutive events separately, the output makes no sense since the totals for the first instance of X1 (which should be the same whether they are consecutive or non-consecutive) are different between the two data sets.  Here is the code for the three counter fields:

 

/*Creating a counter field to track the number of records for each individual account*/
data Step2;
set Step1;
by accountnumber;
if first.accountnumber
then acct_int = 1;
else acct_int + 1; /*iterative statement*/
run;

 

/*Creating a second counter field that starts and stops with each X1 event - Consecutive*/
data Step3 (drop=_n:);
set Step2;
by accountnumber;
if first.accountnumber then _n=0;
_n+event_type='X1';
event_int_1=ifn(event_type='X1',_n,0);
run;

 

/*Creating a third counter field that resets to zero for each non-consecutive X1*/
data Step4 (rename=(_n=event_int_2));
set Step3;
by accountnumber;
if first.accountnumber then _n=0;
if event_type='Non-X1' then _n=0;
else _n +1;
run;

 

The output will have this layout:

 

accountnumber  event_type    acct_int  event_int_1  event_int_2

         001                Non-X1           1              0                   0

         001                     X1              2              1                   1

         001                Non-X1           3               1                  0

         001                     X1              4               2                  1

         001                     X1              5               3                  2

         002                     X1              1               1                  1

         002                     X1              2               2                  2

         002                Non-X1           3               2                  0

         002                     X1              4               3                  1

         003                Non-X1           1               0                  0     

         004                     X1              1               1                  1 

         005                     X1              1               1                  1

         005                Non-X1           2                1                 0

         006                Non-X1           1                0                 0

         006                Non-X1           2                0                 0

         006                    X1               3                1                 1

 

When I use the GROUP BY statement using the event_1 and event_2 variables in PROC SQL, the totals don't make sense.  Am I properly identifying the consecutive and non-consecutive records with my counters?  Any assistance would be greatly appreciated in this issue.  Thanks.

2 REPLIES 2
mkeintz
PROC Star

You can do this simply in a single data step, using what sas calls SUM statements, where the statement

   x+y;

means

  1. Sum the values of X and Y, and store the result in X.  (i.e. X=sum(x,y));
  2. Retain the resulting value for the next iteration of the data step.

For example, (untested):

data want;
  set have;
  by accountnumber ;
  if first.accountnumber then call missing (acct_int,event_int_1,event_int_2);

  acct_int+1;

  event_int_1+(event_type='X');

  if event_type^='X' then event_int_2=0;
  else event_int_2+1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
greg6363
Obsidian | Level 7

I rewrote my code and created a data step to include a sum statement:

 

DATA master;
INPUT accountnumber event_type $ acct_int event_int_1 event_int_2 ;
DATALINES ;
1001 Non-X1 1 0 0
1001 X1 2 1 1
1001 Non-X1 3 1 0
1001 X1 4 2 1
1001 X1 5 3 2
1002 X1 1 1 1
1002 X1 2 2 2
1002 Non-X1 3 2 0
1002 X1 4 3 1
1003 Non-X1 1 0 0
1004 X1 1 1 1
1005 X1 1 1 1
1005 Non-X1 2 1 0
1006 Non-X1 1 0 0
1006 Non-X1 2 0 0
1006 X1 3 1 1; RUN;

data step1;
set master;
X=sum(event_int_1,event_int_2);
run;

 

data step2;
set step1;
by accountnumber ;
if first.accountnumber then call missing (acct_int,event_int_1,event_int_2);

acct_int+1;

event_int_1+(event_type='X');

if event_type^='X' then event_int_2=0;
else event_int_2+1;
run;

 

Unfortunately, the 2 value for the 2nd iteration in the acct_int field has disappeared when using the acct_int+1 statement.  I don't want that field value to change.  Here is my output when producing the step2 data set:

 

accountnumber  event_type  acct_int  event_int_1  event_int_2  X

1001 Non-X1 1 0 0 0 
1001 X1 3 1 0 2 
1001 Non-X1 4 1 0 0 
1001 X1 5 2 0 3  
1001 X1 6 3 0 5 
1002 X1 1 0 0 2  
1002 X1 3 2 0 4 
1002 Non-X1 4 2 0 2
1002 X1 5 3 0 4 
1003 Non-X1 1 0 0 0  
1004 X1 1 0 0 2 
1005 X1 1 0 0 2  
1005 Non-X1 3 1 0 1 
1006 Non-X1 1 0 0 0 
1006 Non-X1 3 0 0 0 
1006 X1 4 1 0 2 

 

Let me know what I'm doing wrong.  Thanks.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 951 views
  • 0 likes
  • 2 in conversation