BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nikos
Fluorite | Level 6

Dear all,

I have a data set grouped by ID and Transaction_Quarter as follows:

ID   Trns_QTR   Val_A     Val_B

100   2010Q1       1456     345

100   2010Q2       2500       0

100   2010Q3       2500       0

100   2010Q4       2500       0

200   2010Q4          0       1200

200   2011Q1       3100       0

200   2011Q2       3200       0

200   2011Q3       3100       0

200   2011Q4       3300       0

200   2012Q1       2500    12876

300   2019Q4       2390       0

300   2010Q1       4300       0

300   2010Q2       2300       678

300   2010Q3       5600       0

300   2010Q4       5600       0

300   2011Q1       2500       0

300   2011Q2       3200       0

300   2011Q3       3100       0

300   2011Q4       3300     12876

300   2012Q1       3200       0

300   2012Q1       3100       0

300   2012Q2       3300       0

I would like to find:

1).-  The number(s) of consecutive rows, in chronological order, within a by-group that Val_B = 0 (e.g. for ID 300 I will have three counts  2 , 5, 3) along with their respective and the beginning and ending Trns_QTR

      e.g.  2,   2019Q4  - 2010Q1      

              5,   2010Q3 -  2011Q3

              3,   2012Q1 -  2012Q2

         

2).-  Create a Flag that will show whether the last Trns_QTR equals a given value i.e. 2012Q2 , of the latest "subgroup" (of consecutive zeros).

3).- Count the number of ID's that have count of consecutive Val_B,  0's  GT 3 and its Trns_QTR equals a given value i.e. 2012Q2

Any feedback would be much appreciated

Thanking you in advance.

Best regards

Nikos

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Data A;
  Input ID Trns_QTR:YYQ. Val_A Val_B;
  format Trns_Qtr YYQs6.;
  Datalines;
100 2010Q1 1456 345
100 2010Q2 2500 0
100 2010Q3 2500 0
100 2010Q4 2500 0
200 2010Q4 0 1200
200 2011Q1 3100 0
200 2011Q2 3200 0
200 2011Q3 3100 0
200 2011Q4 3300 0
200 2012Q1 2500 12876
300 2009Q4 2390 0
300 2010Q1 4300 0
300 2010Q2 2300 678
300 2010Q3 5600 0
300 2010Q4 5600 0
300 2011Q1 2500 0 
300 2011Q2 3200 0
300 2011Q3 3100 0
300 2011Q4 3300 12876
300 2012Q1 3200 0
300 2012Q1 3100 0
300 2012Q2 3300 0
;
Run;
data one;
 set a;
 by id Val_B notsorted;
 length range $ 20;
 retain first;
 if first.Val_B then do;first=Trns_QTR;call missing(Trns_QTR);count=0; end;
 count+1;
 if last.Val_B and Val_B=0 then do;range=catx('-',put(first,yyq.),put(Trns_QTR,yyq.));output;end;
 keep id range count;
run;
data two;
 set one;
 by id;
 if last.id and scan(range,-1,'-')='2012Q2' then flag=1;
  else flag=0;
run;
proc sql;
 create table three as
  select count(distinct id) as n
   from two
    where count gt 3 and  scan(range,-1,'-') eq '2012Q2';
quit;

Xia Keshan

View solution in original post

5 REPLIES 5
user24feb
Barite | Level 11

I think this requires a multi-step approach. One answer to 1) could be:


Data A;
  Input ID Trns_QTR:YYQ. Val_A Val_B;
  format Trns_Qtr YYQs6.;
  Datalines;
100 2010Q1 1456 345
100 2010Q2 2500 0
100 2010Q3 2500 0
100 2010Q4 2500 0
200 2010Q4 0 1200
200 2011Q1 3100 0
200 2011Q2 3200 0
200 2011Q3 3100 0
200 2011Q4 3300 0
200 2012Q1 2500 12876
300 2019Q4 2390 0
300 2010Q1 4300 0
300 2010Q2 2300 678
300 2010Q3 5600 0
300 2010Q4 5600 0
300 2011Q1 2500 0
300 2011Q2 3200 0
300 2011Q3 3100 0
300 2011Q4 3300 12876
300 2012Q1 3200 0
300 2012Q1 3100 0
300 2012Q2 3300 0
;
Run;

* create new ID;
Data B ;
  Length ID_dummy $100.;
  Set A;
  Retain dummy;
  By ID;
  If First.ID Then dummy=IfN(Val_B eq 0,1,0);
  Else Do;
    If Val_B eq 0 & dummy eq 0 Then dummy=1;
    Else If Val_B gt 0 Then dummy+1;
  End;
  If Val_B ne 0 Then Nr=0; Else Nr=dummy;
  If Nr ne 0;
  ID_dummy=Catt(Put(ID,Z3.),'_',Put(Nr,Z2.));
  One=1;
Run;

* get length;
Data C (Keep=ID Trns_Qtr Total ID_dummy);
  Do i=1 By 1 Until (Last.ID_dummy);
    Set B;
By ID_dummy;
Total=Sum(One,Total);
  End;
  Do Until (Last.ID_dummy);
    Set B;
By ID_dummy;
Output;
  End;
Run;

* save first and last;
Data D;
  Length Col $10.;
  Set C;
  By ID_dummy;
  If First.ID_dummy Then Do;
    Col="Start"; Output;
  End;
  Else If Last.ID_dummy Then Do;
    Col="End"; Output;
  End;
Run;

* transpose;
Proc Transpose Data=D Out=E (Drop=ID_dummy _NAME_);
  By ID_dummy ID Total;
  Var Trns_Qtr;
  ID Col;
Run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Your not really asking a question there are you, more of a write my program for me Smiley Happy

I note there is a logic error in your test data, the first record of 300 shows 2019 which sequentially would be after all the others??

For 1)

data have;
  infile datalines;
  input ID Trns_QTR yyq9. Val_A Val_B;
datalines;
300   2019Q4       2390       0
300   2010Q1       4300       0
300   2010Q2       2300       678
300   2010Q3       5600       0
300   2010Q4       5600       0
300   2011Q1       2500       0
300   2011Q2       3200       0
300   2011Q3       3100       0
300   2011Q4       3300     12876
300   2012Q1       3200       0
300   2012Q1       3100       0
300   2012Q2       3300       0
;
run;

data inter;
  set have;
  by id;
  retain grp;
  if first.id then grp=1;
  if val_b ne 0 then grp=grp+1;
  if val_b=0 or last.id then sel=1;
run;

proc sql;
  create table WANT as
  select  ID,
          GRP,
          MIN(TRNS_QTR) as FIRST_DATE format=yyq9.,
          MAX(TRNS_QTR) as LAST_DATE format=yyq9.,
          COUNT(1) as N
  from    (select * from WORK.INTER where SEL=1)
  group by ID,
           GRP;
quit;
 

The above should give you a good start on 2 and 3.

Nikos
Fluorite | Level 6

Hi RW9,

You are right.

It should read  2009Q4 (instead of 2019Q4)

Thank you

Kurt_Bremser
Super User

data have;

infile cards;

length id 4 trns_qtr $6 val_a 8 val_b 8;

input id trns_qtr val_a val_b;

cards;

100 2010Q1 1456 345

100 2010Q2 2500 0

100 2010Q3 2500 0

100 2010Q4 2500 0

200 2010Q4 0 1200

200 2011Q1 3100 0

200 2011Q2 3200 0

200 2011Q3 3100 0

200 2011Q4 3300 0

200 2012Q1 2500 12876

300 2009Q4 2390 0

300 2010Q1 4300 0

300 2010Q2 2300 678

300 2010Q3 5600 0

300 2010Q4 5600 0

300 2011Q1 2500 0

300 2011Q2 3200 0

300 2011Q3 3100 0

300 2011Q4 3300 12876

300 2012Q1 3200 0

300 2012Q1 3100 0

300 2012Q2 3300 0

;

run;

%let compare_qtr=2012Q2;

/* solution of 1 & 2 */

data want1 (keep=id number startq endq flag);

set have;

by id;

retain startq endq number;

if first.id then number = .;

if val_b = 0

then do;

  endq = trns_qtr;

  if number = .

  then do;

    number = 1;

    startq = trns_qtr;

  end;

  else number + 1;

end;

else do;

  if number ne .

  then do;

    if endq = "&compare_qtr"

    then flag = 1;

    else flag = 0;

    output;

    number = .;

  end;

end;

if last.id and number ne .

then do;

  endq = trns_qtr;

  if endq = "&compare_qtr"

  then flag = 1;

  else flag = 0;

  output;

end;

run;

/* solution of 3 */

proc sql;

create table want2 as

select

  count(distinct id) as id

from want1

where flag = 1 and number ge 3

;

quit;

Ksharp
Super User
Data A;
  Input ID Trns_QTR:YYQ. Val_A Val_B;
  format Trns_Qtr YYQs6.;
  Datalines;
100 2010Q1 1456 345
100 2010Q2 2500 0
100 2010Q3 2500 0
100 2010Q4 2500 0
200 2010Q4 0 1200
200 2011Q1 3100 0
200 2011Q2 3200 0
200 2011Q3 3100 0
200 2011Q4 3300 0
200 2012Q1 2500 12876
300 2009Q4 2390 0
300 2010Q1 4300 0
300 2010Q2 2300 678
300 2010Q3 5600 0
300 2010Q4 5600 0
300 2011Q1 2500 0 
300 2011Q2 3200 0
300 2011Q3 3100 0
300 2011Q4 3300 12876
300 2012Q1 3200 0
300 2012Q1 3100 0
300 2012Q2 3300 0
;
Run;
data one;
 set a;
 by id Val_B notsorted;
 length range $ 20;
 retain first;
 if first.Val_B then do;first=Trns_QTR;call missing(Trns_QTR);count=0; end;
 count+1;
 if last.Val_B and Val_B=0 then do;range=catx('-',put(first,yyq.),put(Trns_QTR,yyq.));output;end;
 keep id range count;
run;
data two;
 set one;
 by id;
 if last.id and scan(range,-1,'-')='2012Q2' then flag=1;
  else flag=0;
run;
proc sql;
 create table three as
  select count(distinct id) as n
   from two
    where count gt 3 and  scan(range,-1,'-') eq '2012Q2';
quit;

Xia Keshan

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 849 views
  • 7 likes
  • 5 in conversation