BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vanja
Obsidian | Level 7

Hi!

I want to count the maximum number of consecutive 1:s in my dataset. See example of have and want dataset. My dataset(s) have millions of rows so I want to avoid transposing the data. I have seen similar solutions of this problem here but I don't quite understand them. Is there someone who can help me with the code needed to calculate the variable maxYears?

 

data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		; 
run; 

data want; 
	input id yr2010-yr2019 maxYears; 
	datalines; 
		 1 . . . . . . . . 1 1 2
		 2 1 1 1 . . . 1 1 1 1 4
		 3 1 1 1 1 1 . . . 1 1 5
		 4 . . . . . . 1 . . . 1
		; 
run; 
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		;

data want;
 set have;
 temp=cats(of yr:);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
 maxYears=max(maxYears,l);
 call prxnext(pid,s,e,temp,p,l);
end;
drop pid s e p l temp;
run;

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26

I have seen similar solutions of this problem here but I don't quite understand them.

 

Did you try them? Do they work? Don't make us re-invent a solution if there already is a solution (which we could then explain to you).

 

Here is a solution that could be explained:

 

https://communities.sas.com/t5/SAS-Programming/Counting-groups-of-consecutive-characters-in-string/m...

--
Paige Miller
vanja
Obsidian | Level 7

Sorry, but I can't translate the similar solutions I have found to my specific problem.

Ksharp
Super User
data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		;

data want;
 set have;
 temp=cats(of yr:);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
 maxYears=max(maxYears,l);
 call prxnext(pid,s,e,temp,p,l);
end;
drop pid s e p l temp;
run;
vanja
Obsidian | Level 7
Thank you so much for helping me. I have finished working for the day but will try it out tommorow!
Ksharp
Super User

I think @Astounding  's code is better.

 

data have; 
	input id yr2010-yr2019; 
	datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
;

data want;
 set have;
 array yrs {*} yr2010 - yr2019;
 maxyears = 0;
 count = 0;
 do _n_=1 to dim(yrs);
   if yrs{_n_} = 1 then count + 1;
   else do;      maxyears = max(maxyears, count);
     count = 0;
   end;
 end;
 maxyears = max(maxyears, count);
 drop count;
 run;
vanja
Obsidian | Level 7
You were the first to answer so I thought it was fair to give ju the role as solution programmer. Would you rather that I gave that to @Astounding ? For other people that might have the same problem.
Astounding
PROC Star

A straightforward way: 

data want;
 set have;
 array yrs {2010:2019} yr2010 - yr2019;
 maxyears = 0;
 count = 0;
 do _n_=2010 to 2019;
   if yrs{_n_} = 1 then count + 1;
   else do; 
maxyears = max(maxyears, count); count = 0; end; end; maxyears = max(maxyears, count); drop count; run;
mkeintz
PROC Star

I see your array statement, and raise it by a repeat function (😉😞

 

data have; 
  input id yr2010-yr2019; 
datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
run; 

data want (drop=_:);
  set have;
  length _strng $10;
  _strng=cats(of yr:);
  do  LEN=length(_strng) to 1 by -1  while (find(_strng,repeat('1',LEN-1))=0);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

If the absent years really are always represented with a standard missing value, then your could  (1) concatenate all the year values into a character string, call it _STRNG, (2) calculate the length of each "word" in _STRNG, where a word is a substring containing no separators (no "." in this case):

 

data want ;
  set have;

  length _strng $10;
  _strng=cats(of yr:);
  if findc(_strng,'1') then do _w=1 to countw(_strng,'.');
    maxyears=max(maxyears,length(scan(_strng,_w)));
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Similar to what others already suggested with some logic added to derive the start and stop year for the longest period.

data have; 
	input id yr2010-yr2019; 
  length str $10 word $10;
  array yrs {*} yr2010-yr2019;

  /* find longest string */
  str=cats(of yr:);
  do i=1 by 1;
    word=scan(str,i,,'kd');
    if missing(word) then leave;
    max_len=max(max_len,length(word));
  end;

  /* find start and stop variables (years) for longest period */
  start=find(str,repeat('1',max_len-1));
  stop=start+max_len-1;
  from_year =input(compress(vname(yrs[start]),,'kd'),best32.);
  to_year   =input(compress(vname(yrs[stop]),,'kd'),best32.);

  drop str word i start stop;

	datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
; 

 

vanja
Obsidian | Level 7

Thanks for this extra info with start and end years, could turn out to be really useful for this research project.

Patrick
Opal | Level 21

@vanja BTW: It wasn't in your sample data but what should happen if there is more than one max period? Which one would you want to select?

vanja
Obsidian | Level 7
True, my real data will for sure have some observations where more periods will have same number of max years. That is a good question. I will have to look into that. Is there a way to flag them?
Patrick
Opal | Level 21

@vanja Define how such cases should be treated.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 4654 views
  • 14 likes
  • 7 in conversation