I have data looks following:
id score year
01 0 2000
01 2 2001
01 2 2002
01 1 2006
02 2 2000
02 0 2001
02 0 2003
02 1 2005
02 2 2006
03 0 2000
03 1 2008
03 0 2014
03 0 2015
I want to select the ID's that have a score of 0 and successive id have a score of 2. The selected row (may be one or multiple) will be one that has score 2. In the above data, id 01 (year 2001 and 2002) and 02 (year 2006) will be selected.
@Ramin1 wrote:
I have data looks following:
id score year
01 0 2000
01 2 2001
01 2 2002
01 1 2006
02 2 2000
02 0 2001
02 0 2003
02 1 2005
02 2 2006
03 0 2000
03 1 2008
03 0 2014
03 0 2015
I want to select the ID's that have a score of 0 and successive id have a score of 2. The selected row (may be one or multiple) will be one that has score 2. In the above data, id 01 (year 2001 and 2002) and 02 (year 2006) will be selected.
One suspects there will be other bits about exceptions to the above but this does the requested with the example data:
data have; input id $ score year; datalines; 01 0 2000 01 2 2001 01 2 2002 01 1 2006 02 2 2000 02 0 2001 02 0 2003 02 1 2005 02 2 2006 03 0 2000 03 1 2008 03 0 2014 03 0 2015 ; data want; set have; by id; retain startseq; if first.id then startseq=0; if score=0 then startseq=1; else if score=2 and startseq=1 then output; run;
Please note the data step to provide example data. If you don't provide such we may make assumptions as to variable types and develop solutions that work for the assumed data that will not work with yours.
With a By statement SAS creates automatic First. and Last. variables you can use to test if a record is the first, last or neither of a by group so is useful for resetting values for a new Id.
The Retain keeps the value of a variable across iterations of the data step. So you can reset it as needed.
Once your code logic is working you would probably drop the Startseq variable as not needed.
@Ramin1 wrote:
I have data looks following:
id score year
01 0 2000
01 2 2001
01 2 2002
01 1 2006
02 2 2000
02 0 2001
02 0 2003
02 1 2005
02 2 2006
03 0 2000
03 1 2008
03 0 2014
03 0 2015
I want to select the ID's that have a score of 0 and successive id have a score of 2. The selected row (may be one or multiple) will be one that has score 2. In the above data, id 01 (year 2001 and 2002) and 02 (year 2006) will be selected.
One suspects there will be other bits about exceptions to the above but this does the requested with the example data:
data have; input id $ score year; datalines; 01 0 2000 01 2 2001 01 2 2002 01 1 2006 02 2 2000 02 0 2001 02 0 2003 02 1 2005 02 2 2006 03 0 2000 03 1 2008 03 0 2014 03 0 2015 ; data want; set have; by id; retain startseq; if first.id then startseq=0; if score=0 then startseq=1; else if score=2 and startseq=1 then output; run;
Please note the data step to provide example data. If you don't provide such we may make assumptions as to variable types and develop solutions that work for the assumed data that will not work with yours.
With a By statement SAS creates automatic First. and Last. variables you can use to test if a record is the first, last or neither of a by group so is useful for resetting values for a new Id.
The Retain keeps the value of a variable across iterations of the data step. So you can reset it as needed.
Once your code logic is working you would probably drop the Startseq variable as not needed.
@Ramin1 wrote:
Thanks a lot, @ballardw. If I want to get observations who have scores of 0 and successive scores also 0 (output only for successive scores) then does this logic work? For instance, for the data in this question, I want to select
id score year
02 0 2003
03 0 2014
03 0 2015
Can you please explain, how your code selects the successive observations?
The code for the original question examines each value of score and when the condition was met then a flag variable is set to indicate it was found. The Retain means that flag is kept until a new Id is encountered. The Output, writing to the data set, only occurs when the conditions of the flag variable Startseq indicates a prior score of 0 had been encountered for the Id and the current value is 2.
This new question is only somewhat related. The difference is that since you asking about "successive" values then you need to COUNT the number of 0 scores and only output when the number of 0 scores is greater than 1.
One way:
data want2; set have; by id; retain zerocount; if first.id then zerocount=0; if score=0 then zerocount+1; if score=0 and zerocount>1 then output; run;
Similar us of the By Id and the First. to reset the counter. Note that Zerocount+1 adds 1 to previous value, which would be 0 until a score of 0 is encountered. Then output only when the count is more than 1 and is the score you want.
Next question on this line you should show what you have attempted before asking. If you get odd results be prepared to show your code from the LOG, meaning what was actually submitted along with all the notes, warnings or errors generated. If posting Log text open a text box on the forum using the </> and paste the text. It will look like the boxes with the code I posted. The text box is important for Log text because SAS often places diagnostic information in the log but the main message windows will reformat the text moving characters and possibly making that diagnostic information less useful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.