BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
linda0910
Obsidian | Level 7

Hi all, 

 

I am stuck by counting consecutive values in the program and wish you could help me. 

 

My data is like this:

 

ID count keep

1  1  Yes

1  2  Yes

1  3  Yes

1  4  No

1  5  No

1  6  Yes

1  7  Yes

2  1  No

2  2  No

2  3  Yes

2  4  Yes

2  5  Yes

2  6  No

 

I want to find the "count number" range of the first consecutive "Yes" 

 

For ID=1  count number should be "1-3"

For ID=2  count number should be "3-5"

 

Thanks a lot. 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

Well, this may not be my most polished program, but it is producing the desired results.  I wrote it in a hurry.  I would want to test it with a lot more test data before using it for something important.

Results:

jimbarbour_0-1623460461017.png

Program:

Data	Have;
	INPUT
		ID		$
		count	$
		keep	$
		;

DATALINES;
1  1  Yes
1  2  Yes
1  3  Yes
1  4  No
1  5  No
1  6  Yes
1  7  Yes
2  1  No
2  2  No
2  3  Yes
2  4  Yes
2  5  Yes
2  6  No
;
RUN;

PROC	SORT	DATA=Have;
	BY	ID	Count;
RUN;

DATA	Want;
	DROP	_:	Count	Keep;
	RETAIN	_Prior_Was_Yes	0;
	RETAIN	_ID_Processed	0;
	RETAIN	_New_For_ID		1;
	RETAIN	_First_Yes		0;
	RETAIN	_Last_Yes		0;

	SET	Have	END						=	_End_of_File;
		BY	ID;

	LENGTH	Count_Number	$8;

	IF	First.ID									THEN
		DO;		
			_New_For_ID					=	1;
			_ID_Count					=	0;
			_Counter					=	1;
		END;

	_ID_Count							+	1;

	IF	_ID_Processed								THEN
		DO;
			IF	Last.ID								THEN
				DO;
					_Prior_Was_Yes		=	0;
					_ID_Processed		=	0;
					_New_For_ID			=	1;
				END;
			DELETE;
		END;

	IF	UPCASE(Keep)					=	'YES'	THEN
		_Current_Is_Yes					=	1;

	IF	_N_								=	1		THEN
		IF	_Current_Is_Yes							THEN
			DO;
				_First_Yes				=	1;
				_Last_Yes				=	1;
				IF	Last.ID							THEN
					DO;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
						_First_Yes		=	0;
						_Last_Yes		=	0;
					END;
				ELSE
					DO;
						_Prior_Was_Yes	=	1;
						_First_Yes		=	1;
					END;
				DELETE;
			END;
		ELSE
			DO;
				_Prior_Was_Yes			=	0;
				DELETE;
			END;
	ELSE
	IF	_Current_Is_Yes								THEN
		IF	_Prior_Was_Yes							THEN
			DO;
				_Counter				+	1;
				IF	_New_for_ID						THEN
					DO;
						_New_for_ID		=	0;
						_First_Yes		=	_ID_Count	-	1;
					END;
				IF	Last.ID							THEN
					DO;
						_Last_Yes		=	_ID_Count;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
					END;
				ELSE
					DO;
						_Prior_Was_Yes	=	1;
					END;
				DELETE;
			END;
		ELSE
			DO;
				IF	Last.ID							THEN
					DO;
						_First_Yes		=	0;
						_Last_Yes		=	0;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
					END;
				ELSE
					DO;
						_Prior_Was_Yes	=	1;
					END;
				DELETE;
			END;
	ELSE
		IF	_Prior_Was_Yes							THEN
			DO;
				IF	Last.ID							THEN
					DO;
						IF	_Counter	>	1		THEN
							DO;
								_Last_Yes	=	_ID_Count	-	1;
							END;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
					END;
				ELSE
					DO;
						IF	_Counter	>	1		THEN
							DO;
								_Last_Yes	=	_ID_Count	-	1;
							END;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	1;
						_New_For_ID		=	1;
					END;
				DELETE;
			END;
		ELSE
			DO;
				_Prior_Was_Yes			=	0;
				DELETE;
			END;
	******;
	RETURN;
	******;

	************;
	Write_Record:
	************;
		Count_Number					=	CATS(PUT(_First_Yes, 8.), '-', PUT(_Last_Yes, 8.));
		OUTPUT;
	******;
	RETURN;
	******;
RUN;

Jim

View solution in original post

11 REPLIES 11
ballardw
Super User

Is this supposed to be a data set as a result? If so, show what the entire data set result should look like.

 

A bit of hint as to what we are actually counting and the rules involved would be nice, such as which specific VARIABLE .

"1-3" is not a number (unless resolved to -2 or similar). 3 is a number.

linda0910
Obsidian | Level 7
Sorry, if I confused you.

This is the dataset.

But for ID=1 the final outcome would be first consecutive “yes “ starting count at “1”, and ending “3”.

For ID=2, the first consecutive “yes” staring at count=3 and ending at count=5
jimbarbour
Meteorite | Level 14

Well, this may not be my most polished program, but it is producing the desired results.  I wrote it in a hurry.  I would want to test it with a lot more test data before using it for something important.

Results:

jimbarbour_0-1623460461017.png

Program:

Data	Have;
	INPUT
		ID		$
		count	$
		keep	$
		;

DATALINES;
1  1  Yes
1  2  Yes
1  3  Yes
1  4  No
1  5  No
1  6  Yes
1  7  Yes
2  1  No
2  2  No
2  3  Yes
2  4  Yes
2  5  Yes
2  6  No
;
RUN;

PROC	SORT	DATA=Have;
	BY	ID	Count;
RUN;

DATA	Want;
	DROP	_:	Count	Keep;
	RETAIN	_Prior_Was_Yes	0;
	RETAIN	_ID_Processed	0;
	RETAIN	_New_For_ID		1;
	RETAIN	_First_Yes		0;
	RETAIN	_Last_Yes		0;

	SET	Have	END						=	_End_of_File;
		BY	ID;

	LENGTH	Count_Number	$8;

	IF	First.ID									THEN
		DO;		
			_New_For_ID					=	1;
			_ID_Count					=	0;
			_Counter					=	1;
		END;

	_ID_Count							+	1;

	IF	_ID_Processed								THEN
		DO;
			IF	Last.ID								THEN
				DO;
					_Prior_Was_Yes		=	0;
					_ID_Processed		=	0;
					_New_For_ID			=	1;
				END;
			DELETE;
		END;

	IF	UPCASE(Keep)					=	'YES'	THEN
		_Current_Is_Yes					=	1;

	IF	_N_								=	1		THEN
		IF	_Current_Is_Yes							THEN
			DO;
				_First_Yes				=	1;
				_Last_Yes				=	1;
				IF	Last.ID							THEN
					DO;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
						_First_Yes		=	0;
						_Last_Yes		=	0;
					END;
				ELSE
					DO;
						_Prior_Was_Yes	=	1;
						_First_Yes		=	1;
					END;
				DELETE;
			END;
		ELSE
			DO;
				_Prior_Was_Yes			=	0;
				DELETE;
			END;
	ELSE
	IF	_Current_Is_Yes								THEN
		IF	_Prior_Was_Yes							THEN
			DO;
				_Counter				+	1;
				IF	_New_for_ID						THEN
					DO;
						_New_for_ID		=	0;
						_First_Yes		=	_ID_Count	-	1;
					END;
				IF	Last.ID							THEN
					DO;
						_Last_Yes		=	_ID_Count;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
					END;
				ELSE
					DO;
						_Prior_Was_Yes	=	1;
					END;
				DELETE;
			END;
		ELSE
			DO;
				IF	Last.ID							THEN
					DO;
						_First_Yes		=	0;
						_Last_Yes		=	0;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
					END;
				ELSE
					DO;
						_Prior_Was_Yes	=	1;
					END;
				DELETE;
			END;
	ELSE
		IF	_Prior_Was_Yes							THEN
			DO;
				IF	Last.ID							THEN
					DO;
						IF	_Counter	>	1		THEN
							DO;
								_Last_Yes	=	_ID_Count	-	1;
							END;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	0;
						_New_For_ID		=	1;
					END;
				ELSE
					DO;
						IF	_Counter	>	1		THEN
							DO;
								_Last_Yes	=	_ID_Count	-	1;
							END;
						LINK	Write_Record;
						_Prior_Was_Yes	=	0;
						_ID_Processed	=	1;
						_New_For_ID		=	1;
					END;
				DELETE;
			END;
		ELSE
			DO;
				_Prior_Was_Yes			=	0;
				DELETE;
			END;
	******;
	RETURN;
	******;

	************;
	Write_Record:
	************;
		Count_Number					=	CATS(PUT(_First_Yes, 8.), '-', PUT(_Last_Yes, 8.));
		OUTPUT;
	******;
	RETURN;
	******;
RUN;

Jim

linda0910
Obsidian | Level 7
Thanks so much. It worked on my data, but for some of the observations, it count_number from 0-1. Since there is no "Zero" in the count, I guess there is just one "Yes".

I do appreciate your help.
linda0910
Obsidian | Level 7

I just found if some cases have all the "Yes" from the first count till last, it would be deleted in the final output. 

jimbarbour
Meteorite | Level 14

@linda0910 ,

 

Can you post the cases that don't seem to be working right as Datalines?  Something like the below.  If you give me the cases, I can make a correction.

Data	Have;
	INPUT
		ID		$
		count	$
		keep	$
		;

DATALINES;
1  1  Yes
1  2  Yes
1  3  Yes
1  4  No
1  5  No
1  6  Yes
1  7  Yes
2  1  No
2  2  No
2  3  Yes
2  4  Yes
2  5  Yes
2  6  No
;
RUN;

Jim

 

jimbarbour
Meteorite | Level 14

Linda:

 

Here's an improved version of my program, below.  Try running your data through it and see if it gives you what you want.

%MACRO	Get_Var_By_Pointer(d, x1, y1, x2, op2, y2, x3, op3, y3, j);
	_p_					=	_n_	+	&j;

	IF	(1 		<= _p_ 	<=	_o_)	THEN
		DO;
			_Found		=	1;
			SET	&d(KEEP=&x1 &x2 &x3 	RENAME=(&x1=&y1 &x2&op2&y2 &x3&op3&y3))	POINT=_p_	NOBS=_o_;
		END;
	ELSE
		DO;
			_Found		=	0;
		END;
%MEND	Get_Var_By_Pointer;

DATA	Want;
	Drop	_:	Count	Keep	Sort_ID;
	RETAIN	_ID_Processed						0;
	RETAIN	_ID_Count							0;
	_First_Yes								=	0;
	_Last_Yes								=	0;
	LENGTH	_N_Keep								$8.;

	SET	Have	END							=	_End_of_File;
		BY	Sort_ID;

	LENGTH	Count_Number						$8.;

	IF	LAST.Sort_ID									AND
		NOT	_ID_Processed								THEN
		DO;
			Count_Number					=	CATX('-', '0', '0');
			OUTPUT;
		END;

	IF	First.Sort_ID									THEN
		DO;
			_ID_Count						=	0;
			_ID_Processed					=	0;
		END;

	IF	_ID_Processed									THEN
		DO;
			DELETE;
		END;
	ELSE
		_ID_Count							+	1;

	IF	UPCASE(Keep)						=	'YES'	THEN
		DO;
			_Counter						=	1;
			_First_Yes						=	_ID_Count;
			_Last_Yes						=	0;
			_j								=	1;
			_Found							=	0;
			_N_Keep							=	'';
			_ID_Processed					=	0;
			DO	UNTIL	(NOT	_Found					OR
						ID					^=	_N_ID	OR
						UPCASE(_N_Keep)		^=	'YES');
				IF	UPCASE(_N_Keep)			=	'YES'	THEN
					DO;
						_Counter			+	1;
					END;
				%Get_Var_By_Pointer(Have, ID, _N_ID, Count, %STR(=), _N_Count, Keep, %STR(=), _N_Keep, _j);
				_j							+	1;
			END;
			IF	_Counter					>	1		THEN
				DO;
					_Last_Yes				=	_ID_Count	+	_Counter	-	1;
					Count_Number			=	CATS(PUT(_First_Yes, 8.), '-', PUT(_Last_Yes, 8.));
					_ID_Processed			=	1;
					OUTPUT;
				END;
			ELSE
				DO;
					_Counter				=	0;
					_j						=	1;
					_First_Yes				=	0;
					_Last_Yes				=	0;
				END;
		END;
RUN;

Jim

Tom
Super User Tom
Super User

Seems simple enough.  Use NOTSORTED keyword in the BY statement so SAS will calculate FIRST.KEEP and LAST.KEEP flags for you.  Note the data needs to actually be sorted by ID and COUNT.

 

Use RETAIN to keep track of when the first and last count for the first YES group was.

 

data have;
  input id $ count keep $ ;
cards;
1  1  Yes
1  2  Yes
1  3  Yes
1  4  No
1  5  No
1  6  Yes
1  7  Yes
2  1  No
2  2  No
2  3  Yes
2  4  Yes
2  5  Yes
2  6  No
3  1  No
4  1  Yes
;

data want;
  set have ;
  by id keep notsorted;
  retain first last ;
  length range $8 ;
  if first.id then call missing(first,last);
  if keep='Yes' then do;
    if not first then first=count;
    if last.keep and not last then last=count;
  end;
  if last.id then do;
    if first then range=catx('-',first,last);
    output;
  end;
  keep id range ;
run;

proc print;
run;

Results:

Obs    id    range

 1     1      1-3
 2     2      3-5
 3     3
 4     4      1-1

Ksharp
Super User
data have;
  input id $ count keep $ ;
cards;
1  1  Yes
1  2  Yes
1  3  Yes
1  4  No
1  5  No
1  6  Yes
1  7  Yes
2  1  No
2  2  No
2  3  Yes
2  4  Yes
2  5  Yes
2  6  No
3  1  No
4  1  Yes
;
data temp;
 set have;
 by id keep notsorted;
 group+first.keep;
run;
proc sql;
create table want as
select id,catx('-',min(count),max(count)) as want length=10 from (
select *
 from temp 
  where keep='Yes'
    group by id
	  having group=min(group) )
group by id;
quit;
jimbarbour
Meteorite | Level 14

@Ksharp and @Tom,

 

I like both of your solutions better than my "sit down at the computer and bang out the first idea that comes to mind" program which is rather kludgey.

 

However, with a larger set of data (see below), I didn't get the results that I wanted using your code.  Note that I switched Count to a numeric variable and that I add a numeric Sort_ID, both for sorting purposes.  I haven't played with it yet, but the change in ID, which is not in character sort order is likely futzing up the BY processing.

Data	Have;
	INPUT
		ID		$
		count	
		keep	$
		;

	Sort_ID	=	INPUT(ID, 8.);
DATALINES;
1  1  Yes
1  2  Yes
1  3  Yes
1  4  No
1  5  No
1  6  Yes
1  7  Yes
2  1  No
2  2  No
2  3  Yes
2  4  Yes
2  5  Yes
2  6  No
3  1  YES
3  2  YES
3  3  YES
3  4  YES
4  1  YES
4  2  NO
4  3  YES
4  4  YES
4  5  NO
5  1  NO 
5  2  YES
5  3  NO
5  4  YES
5  5  NO
5  6  NO
5  7  YES
5  8  YES
5  9  NO
5  10  YES
5  11  YES
5  12  YES
5  13  YES
6  1  NO
6  2  YES
6  3  yes
7  1  YES
7  2  NO
7  3  YES
8  1  NO
8  2  NO
8  3  YES
9  1  NO
10  1  yes
11  1  NO
11  2  YES
;
RUN;

PROC	SORT	DATA=Have;
	BY	Sort_ID	Count;
RUN;

Jim

 

Ksharp
Super User
You have messy data , Yes YES yes ...
So change my code this could handle this :
where lowcase(keep)='yes'

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 3105 views
  • 4 likes
  • 5 in conversation