Hi all,
I am stuck by counting consecutive values in the program and wish you could help me.
My data is like this:
ID count keep
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
I want to find the "count number" range of the first consecutive "Yes"
For ID=1 count number should be "1-3"
For ID=2 count number should be "3-5"
Thanks a lot.
Well, this may not be my most polished program, but it is producing the desired results. I wrote it in a hurry. I would want to test it with a lot more test data before using it for something important.
Results:
Program:
Data Have;
INPUT
ID $
count $
keep $
;
DATALINES;
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
;
RUN;
PROC SORT DATA=Have;
BY ID Count;
RUN;
DATA Want;
DROP _: Count Keep;
RETAIN _Prior_Was_Yes 0;
RETAIN _ID_Processed 0;
RETAIN _New_For_ID 1;
RETAIN _First_Yes 0;
RETAIN _Last_Yes 0;
SET Have END = _End_of_File;
BY ID;
LENGTH Count_Number $8;
IF First.ID THEN
DO;
_New_For_ID = 1;
_ID_Count = 0;
_Counter = 1;
END;
_ID_Count + 1;
IF _ID_Processed THEN
DO;
IF Last.ID THEN
DO;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
DELETE;
END;
IF UPCASE(Keep) = 'YES' THEN
_Current_Is_Yes = 1;
IF _N_ = 1 THEN
IF _Current_Is_Yes THEN
DO;
_First_Yes = 1;
_Last_Yes = 1;
IF Last.ID THEN
DO;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
_First_Yes = 0;
_Last_Yes = 0;
END;
ELSE
DO;
_Prior_Was_Yes = 1;
_First_Yes = 1;
END;
DELETE;
END;
ELSE
DO;
_Prior_Was_Yes = 0;
DELETE;
END;
ELSE
IF _Current_Is_Yes THEN
IF _Prior_Was_Yes THEN
DO;
_Counter + 1;
IF _New_for_ID THEN
DO;
_New_for_ID = 0;
_First_Yes = _ID_Count - 1;
END;
IF Last.ID THEN
DO;
_Last_Yes = _ID_Count;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
ELSE
DO;
_Prior_Was_Yes = 1;
END;
DELETE;
END;
ELSE
DO;
IF Last.ID THEN
DO;
_First_Yes = 0;
_Last_Yes = 0;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
ELSE
DO;
_Prior_Was_Yes = 1;
END;
DELETE;
END;
ELSE
IF _Prior_Was_Yes THEN
DO;
IF Last.ID THEN
DO;
IF _Counter > 1 THEN
DO;
_Last_Yes = _ID_Count - 1;
END;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
ELSE
DO;
IF _Counter > 1 THEN
DO;
_Last_Yes = _ID_Count - 1;
END;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 1;
_New_For_ID = 1;
END;
DELETE;
END;
ELSE
DO;
_Prior_Was_Yes = 0;
DELETE;
END;
******;
RETURN;
******;
************;
Write_Record:
************;
Count_Number = CATS(PUT(_First_Yes, 8.), '-', PUT(_Last_Yes, 8.));
OUTPUT;
******;
RETURN;
******;
RUN;
Jim
Is this supposed to be a data set as a result? If so, show what the entire data set result should look like.
A bit of hint as to what we are actually counting and the rules involved would be nice, such as which specific VARIABLE .
"1-3" is not a number (unless resolved to -2 or similar). 3 is a number.
Well, this may not be my most polished program, but it is producing the desired results. I wrote it in a hurry. I would want to test it with a lot more test data before using it for something important.
Results:
Program:
Data Have;
INPUT
ID $
count $
keep $
;
DATALINES;
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
;
RUN;
PROC SORT DATA=Have;
BY ID Count;
RUN;
DATA Want;
DROP _: Count Keep;
RETAIN _Prior_Was_Yes 0;
RETAIN _ID_Processed 0;
RETAIN _New_For_ID 1;
RETAIN _First_Yes 0;
RETAIN _Last_Yes 0;
SET Have END = _End_of_File;
BY ID;
LENGTH Count_Number $8;
IF First.ID THEN
DO;
_New_For_ID = 1;
_ID_Count = 0;
_Counter = 1;
END;
_ID_Count + 1;
IF _ID_Processed THEN
DO;
IF Last.ID THEN
DO;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
DELETE;
END;
IF UPCASE(Keep) = 'YES' THEN
_Current_Is_Yes = 1;
IF _N_ = 1 THEN
IF _Current_Is_Yes THEN
DO;
_First_Yes = 1;
_Last_Yes = 1;
IF Last.ID THEN
DO;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
_First_Yes = 0;
_Last_Yes = 0;
END;
ELSE
DO;
_Prior_Was_Yes = 1;
_First_Yes = 1;
END;
DELETE;
END;
ELSE
DO;
_Prior_Was_Yes = 0;
DELETE;
END;
ELSE
IF _Current_Is_Yes THEN
IF _Prior_Was_Yes THEN
DO;
_Counter + 1;
IF _New_for_ID THEN
DO;
_New_for_ID = 0;
_First_Yes = _ID_Count - 1;
END;
IF Last.ID THEN
DO;
_Last_Yes = _ID_Count;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
ELSE
DO;
_Prior_Was_Yes = 1;
END;
DELETE;
END;
ELSE
DO;
IF Last.ID THEN
DO;
_First_Yes = 0;
_Last_Yes = 0;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
ELSE
DO;
_Prior_Was_Yes = 1;
END;
DELETE;
END;
ELSE
IF _Prior_Was_Yes THEN
DO;
IF Last.ID THEN
DO;
IF _Counter > 1 THEN
DO;
_Last_Yes = _ID_Count - 1;
END;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 0;
_New_For_ID = 1;
END;
ELSE
DO;
IF _Counter > 1 THEN
DO;
_Last_Yes = _ID_Count - 1;
END;
LINK Write_Record;
_Prior_Was_Yes = 0;
_ID_Processed = 1;
_New_For_ID = 1;
END;
DELETE;
END;
ELSE
DO;
_Prior_Was_Yes = 0;
DELETE;
END;
******;
RETURN;
******;
************;
Write_Record:
************;
Count_Number = CATS(PUT(_First_Yes, 8.), '-', PUT(_Last_Yes, 8.));
OUTPUT;
******;
RETURN;
******;
RUN;
Jim
I just found if some cases have all the "Yes" from the first count till last, it would be deleted in the final output.
Can you post the cases that don't seem to be working right as Datalines? Something like the below. If you give me the cases, I can make a correction.
Data Have;
INPUT
ID $
count $
keep $
;
DATALINES;
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
;
RUN;
Jim
Linda:
Here's an improved version of my program, below. Try running your data through it and see if it gives you what you want.
%MACRO Get_Var_By_Pointer(d, x1, y1, x2, op2, y2, x3, op3, y3, j);
_p_ = _n_ + &j;
IF (1 <= _p_ <= _o_) THEN
DO;
_Found = 1;
SET &d(KEEP=&x1 &x2 &x3 RENAME=(&x1=&y1 &x2&op2&y2 &x3&op3&y3)) POINT=_p_ NOBS=_o_;
END;
ELSE
DO;
_Found = 0;
END;
%MEND Get_Var_By_Pointer;
DATA Want;
Drop _: Count Keep Sort_ID;
RETAIN _ID_Processed 0;
RETAIN _ID_Count 0;
_First_Yes = 0;
_Last_Yes = 0;
LENGTH _N_Keep $8.;
SET Have END = _End_of_File;
BY Sort_ID;
LENGTH Count_Number $8.;
IF LAST.Sort_ID AND
NOT _ID_Processed THEN
DO;
Count_Number = CATX('-', '0', '0');
OUTPUT;
END;
IF First.Sort_ID THEN
DO;
_ID_Count = 0;
_ID_Processed = 0;
END;
IF _ID_Processed THEN
DO;
DELETE;
END;
ELSE
_ID_Count + 1;
IF UPCASE(Keep) = 'YES' THEN
DO;
_Counter = 1;
_First_Yes = _ID_Count;
_Last_Yes = 0;
_j = 1;
_Found = 0;
_N_Keep = '';
_ID_Processed = 0;
DO UNTIL (NOT _Found OR
ID ^= _N_ID OR
UPCASE(_N_Keep) ^= 'YES');
IF UPCASE(_N_Keep) = 'YES' THEN
DO;
_Counter + 1;
END;
%Get_Var_By_Pointer(Have, ID, _N_ID, Count, %STR(=), _N_Count, Keep, %STR(=), _N_Keep, _j);
_j + 1;
END;
IF _Counter > 1 THEN
DO;
_Last_Yes = _ID_Count + _Counter - 1;
Count_Number = CATS(PUT(_First_Yes, 8.), '-', PUT(_Last_Yes, 8.));
_ID_Processed = 1;
OUTPUT;
END;
ELSE
DO;
_Counter = 0;
_j = 1;
_First_Yes = 0;
_Last_Yes = 0;
END;
END;
RUN;
Jim
Seems simple enough. Use NOTSORTED keyword in the BY statement so SAS will calculate FIRST.KEEP and LAST.KEEP flags for you. Note the data needs to actually be sorted by ID and COUNT.
Use RETAIN to keep track of when the first and last count for the first YES group was.
data have;
input id $ count keep $ ;
cards;
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
3 1 No
4 1 Yes
;
data want;
set have ;
by id keep notsorted;
retain first last ;
length range $8 ;
if first.id then call missing(first,last);
if keep='Yes' then do;
if not first then first=count;
if last.keep and not last then last=count;
end;
if last.id then do;
if first then range=catx('-',first,last);
output;
end;
keep id range ;
run;
proc print;
run;
Results:
Obs id range 1 1 1-3 2 2 3-5 3 3 4 4 1-1
data have;
input id $ count keep $ ;
cards;
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
3 1 No
4 1 Yes
;
data temp;
set have;
by id keep notsorted;
group+first.keep;
run;
proc sql;
create table want as
select id,catx('-',min(count),max(count)) as want length=10 from (
select *
from temp
where keep='Yes'
group by id
having group=min(group) )
group by id;
quit;
I like both of your solutions better than my "sit down at the computer and bang out the first idea that comes to mind" program which is rather kludgey.
However, with a larger set of data (see below), I didn't get the results that I wanted using your code. Note that I switched Count to a numeric variable and that I add a numeric Sort_ID, both for sorting purposes. I haven't played with it yet, but the change in ID, which is not in character sort order is likely futzing up the BY processing.
Data Have;
INPUT
ID $
count
keep $
;
Sort_ID = INPUT(ID, 8.);
DATALINES;
1 1 Yes
1 2 Yes
1 3 Yes
1 4 No
1 5 No
1 6 Yes
1 7 Yes
2 1 No
2 2 No
2 3 Yes
2 4 Yes
2 5 Yes
2 6 No
3 1 YES
3 2 YES
3 3 YES
3 4 YES
4 1 YES
4 2 NO
4 3 YES
4 4 YES
4 5 NO
5 1 NO
5 2 YES
5 3 NO
5 4 YES
5 5 NO
5 6 NO
5 7 YES
5 8 YES
5 9 NO
5 10 YES
5 11 YES
5 12 YES
5 13 YES
6 1 NO
6 2 YES
6 3 yes
7 1 YES
7 2 NO
7 3 YES
8 1 NO
8 2 NO
8 3 YES
9 1 NO
10 1 yes
11 1 NO
11 2 YES
;
RUN;
PROC SORT DATA=Have;
BY Sort_ID Count;
RUN;
Jim
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.