Solved: logic on successive id selection

Ramin1 · Posted 10-20-2021 11:21 AM

I have data looks following:

id score year

01 0 2000

01 2 2001

01 2 2002

01 1 2006

02 2 2000

02 0 2001

02 0 2003

02 1 2005

02 2 2006

03 0 2000

03 1 2008

03 0 2014

03 0 2015

I want to select the ID's that have a score of 0 and successive id have a score of 2. The selected row (may be one or multiple) will be one that has score 2. In the above data, id 01 (year 2001 and 2002) and 02 (year 2006) will be selected.

ballardw · Posted 10-20-2021 12:05 PM

@Ramin1 wrote:

I have data looks following:

id score year

01 0 2000

01 2 2001

01 2 2002

01 1 2006

02 2 2000

02 0 2001

02 0 2003

02 1 2005

02 2 2006

03 0 2000

03 1 2008

03 0 2014

03 0 2015

I want to select the ID's that have a score of 0 and successive id have a score of 2. The selected row (may be one or multiple) will be one that has score 2. In the above data, id 01 (year 2001 and 2002) and 02 (year 2006) will be selected.

One suspects there will be other bits about exceptions to the above but this does the requested with the example data:

data have;
  input id $ score year;
datalines;
01 0 2000
01 2 2001
01 2 2002
01 1 2006
02 2 2000
02 0 2001
02 0 2003
02 1 2005
02 2 2006
03 0 2000
03 1 2008
03 0 2014
03 0 2015
;

data want;
   set have;
   by id;
   retain startseq;
   if first.id then startseq=0;
   if score=0 then startseq=1;
   else if score=2 and startseq=1 then output;
run;

Please note the data step to provide example data. If you don't provide such we may make assumptions as to variable types and develop solutions that work for the assumed data that will not work with yours.

With a By statement SAS creates automatic First. and Last. variables you can use to test if a record is the first, last or neither of a by group so is useful for resetting values for a new Id.

The Retain keeps the value of a variable across iterations of the data step. So you can reset it as needed.

Once your code logic is working you would probably drop the Startseq variable as not needed.

View solution in original post

ballardw · Posted 10-20-2021 12:05 PM

@Ramin1 wrote:

I have data looks following:

id score year

01 0 2000

01 2 2001

01 2 2002

01 1 2006

02 2 2000

02 0 2001

02 0 2003

02 1 2005

02 2 2006

03 0 2000

03 1 2008

03 0 2014

03 0 2015

I want to select the ID's that have a score of 0 and successive id have a score of 2. The selected row (may be one or multiple) will be one that has score 2. In the above data, id 01 (year 2001 and 2002) and 02 (year 2006) will be selected.

One suspects there will be other bits about exceptions to the above but this does the requested with the example data:

data have;
  input id $ score year;
datalines;
01 0 2000
01 2 2001
01 2 2002
01 1 2006
02 2 2000
02 0 2001
02 0 2003
02 1 2005
02 2 2006
03 0 2000
03 1 2008
03 0 2014
03 0 2015
;

data want;
   set have;
   by id;
   retain startseq;
   if first.id then startseq=0;
   if score=0 then startseq=1;
   else if score=2 and startseq=1 then output;
run;

Please note the data step to provide example data. If you don't provide such we may make assumptions as to variable types and develop solutions that work for the assumed data that will not work with yours.

With a By statement SAS creates automatic First. and Last. variables you can use to test if a record is the first, last or neither of a by group so is useful for resetting values for a new Id.

The Retain keeps the value of a variable across iterations of the data step. So you can reset it as needed.

Once your code logic is working you would probably drop the Startseq variable as not needed.

Ramin1 · Posted 10-20-2021 10:32 PM

Thanks a lot, @ballardw. If I want to get observations who have scores of 0 and successive scores also 0 (output only for successive scores) then does this logic work? For instance, for the data in this question, I want to select
id score year
02 0 2003
03 0 2014
03 0 2015

Can you please explain, how your code selects the successive observations?

ballardw · Posted 10-21-2021 12:11 AM

@Ramin1 wrote:
Thanks a lot, @ballardw. If I want to get observations who have scores of 0 and successive scores also 0 (output only for successive scores) then does this logic work? For instance, for the data in this question, I want to select
id score year
02 0 2003
03 0 2014
03 0 2015

Can you please explain, how your code selects the successive observations?

The code for the original question examines each value of score and when the condition was met then a flag variable is set to indicate it was found. The Retain means that flag is kept until a new Id is encountered. The Output, writing to the data set, only occurs when the conditions of the flag variable Startseq indicates a prior score of 0 had been encountered for the Id and the current value is 2.

This new question is only somewhat related. The difference is that since you asking about "successive" values then you need to COUNT the number of 0 scores and only output when the number of 0 scores is greater than 1.

One way:

data want2;
   set have;
   by id;
   retain zerocount;
   if first.id then zerocount=0;
   if score=0 then zerocount+1;
   if score=0 and zerocount>1 then output;
run;

Similar us of the By Id and the First. to reset the counter. Note that Zerocount+1 adds 1 to previous value, which would be 0 until a score of 0 is encountered. Then output only when the count is more than 1 and is the score you want.

Next question on this line you should show what you have attempted before asking. If you get odd results be prepared to show your code from the LOG, meaning what was actually submitted along with all the notes, warnings or errors generated. If posting Log text open a text box on the forum using the </> and paste the text. It will look like the boxes with the code I posted. The text box is important for Log text because SAS often places diagnostic information in the log but the main message windows will reformat the text moving characters and possibly making that diagnostic information less useful.

logic on successive id selection

Re: logic on successive id selection

Re: logic on successive id selection

Re: logic on successive id selection

Re: logic on successive id selection

logic on successive id selection

Re: logic on successive id selection

Re: logic on successive id selection

Re: logic on successive id selection

Re: logic on successive id selection

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!