BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vanja
Obsidian | Level 7

Hi!

I want to count the maximum number of consecutive 1:s in my dataset. See example of have and want dataset. My dataset(s) have millions of rows so I want to avoid transposing the data. I have seen similar solutions of this problem here but I don't quite understand them. Is there someone who can help me with the code needed to calculate the variable maxYears?

 

data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		; 
run; 

data want; 
	input id yr2010-yr2019 maxYears; 
	datalines; 
		 1 . . . . . . . . 1 1 2
		 2 1 1 1 . . . 1 1 1 1 4
		 3 1 1 1 1 1 . . . 1 1 5
		 4 . . . . . . 1 . . . 1
		; 
run; 
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		;

data want;
 set have;
 temp=cats(of yr:);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
 maxYears=max(maxYears,l);
 call prxnext(pid,s,e,temp,p,l);
end;
drop pid s e p l temp;
run;

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26

I have seen similar solutions of this problem here but I don't quite understand them.

 

Did you try them? Do they work? Don't make us re-invent a solution if there already is a solution (which we could then explain to you).

 

Here is a solution that could be explained:

 

https://communities.sas.com/t5/SAS-Programming/Counting-groups-of-consecutive-characters-in-string/m...

--
Paige Miller
vanja
Obsidian | Level 7

Sorry, but I can't translate the similar solutions I have found to my specific problem.

Ksharp
Super User
data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		;

data want;
 set have;
 temp=cats(of yr:);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
 maxYears=max(maxYears,l);
 call prxnext(pid,s,e,temp,p,l);
end;
drop pid s e p l temp;
run;
vanja
Obsidian | Level 7
Thank you so much for helping me. I have finished working for the day but will try it out tommorow!
Ksharp
Super User

I think @Astounding  's code is better.

 

data have; 
	input id yr2010-yr2019; 
	datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
;

data want;
 set have;
 array yrs {*} yr2010 - yr2019;
 maxyears = 0;
 count = 0;
 do _n_=1 to dim(yrs);
   if yrs{_n_} = 1 then count + 1;
   else do;      maxyears = max(maxyears, count);
     count = 0;
   end;
 end;
 maxyears = max(maxyears, count);
 drop count;
 run;
vanja
Obsidian | Level 7
You were the first to answer so I thought it was fair to give ju the role as solution programmer. Would you rather that I gave that to @Astounding ? For other people that might have the same problem.
Astounding
PROC Star

A straightforward way: 

data want;
 set have;
 array yrs {2010:2019} yr2010 - yr2019;
 maxyears = 0;
 count = 0;
 do _n_=2010 to 2019;
   if yrs{_n_} = 1 then count + 1;
   else do; 
maxyears = max(maxyears, count); count = 0; end; end; maxyears = max(maxyears, count); drop count; run;
mkeintz
PROC Star

I see your array statement, and raise it by a repeat function (😉😞

 

data have; 
  input id yr2010-yr2019; 
datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
run; 

data want (drop=_:);
  set have;
  length _strng $10;
  _strng=cats(of yr:);
  do  LEN=length(_strng) to 1 by -1  while (find(_strng,repeat('1',LEN-1))=0);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

If the absent years really are always represented with a standard missing value, then your could  (1) concatenate all the year values into a character string, call it _STRNG, (2) calculate the length of each "word" in _STRNG, where a word is a substring containing no separators (no "." in this case):

 

data want ;
  set have;

  length _strng $10;
  _strng=cats(of yr:);
  if findc(_strng,'1') then do _w=1 to countw(_strng,'.');
    maxyears=max(maxyears,length(scan(_strng,_w)));
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Similar to what others already suggested with some logic added to derive the start and stop year for the longest period.

data have; 
	input id yr2010-yr2019; 
  length str $10 word $10;
  array yrs {*} yr2010-yr2019;

  /* find longest string */
  str=cats(of yr:);
  do i=1 by 1;
    word=scan(str,i,,'kd');
    if missing(word) then leave;
    max_len=max(max_len,length(word));
  end;

  /* find start and stop variables (years) for longest period */
  start=find(str,repeat('1',max_len-1));
  stop=start+max_len-1;
  from_year =input(compress(vname(yrs[start]),,'kd'),best32.);
  to_year   =input(compress(vname(yrs[stop]),,'kd'),best32.);

  drop str word i start stop;

	datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
; 

 

vanja
Obsidian | Level 7

Thanks for this extra info with start and end years, could turn out to be really useful for this research project.

Patrick
Opal | Level 21

@vanja BTW: It wasn't in your sample data but what should happen if there is more than one max period? Which one would you want to select?

vanja
Obsidian | Level 7
True, my real data will for sure have some observations where more periods will have same number of max years. That is a good question. I will have to look into that. Is there a way to flag them?
Patrick
Opal | Level 21

@vanja Define how such cases should be treated.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 2856 views
  • 14 likes
  • 7 in conversation