Solved: Re: How do I count maximum consecutive 1:s?

vanja · Posted 10-11-2021 07:11 AM

Hi!

I want to count the maximum number of consecutive 1:s in my dataset. See example of have and want dataset. My dataset(s) have millions of rows so I want to avoid transposing the data. I have seen similar solutions of this problem here but I don't quite understand them. Is there someone who can help me with the code needed to calculate the variable maxYears?

data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		; 
run; 

data want; 
	input id yr2010-yr2019 maxYears; 
	datalines; 
		 1 . . . . . . . . 1 1 2
		 2 1 1 1 . . . 1 1 1 1 4
		 3 1 1 1 1 1 . . . 1 1 5
		 4 . . . . . . 1 . . . 1
		; 
run;

Ksharp · Posted 10-11-2021 08:36 AM

data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		;

data want;
 set have;
 temp=cats(of yr:);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
 maxYears=max(maxYears,l);
 call prxnext(pid,s,e,temp,p,l);
end;
drop pid s e p l temp;
run;

View solution in original post

PaigeMiller · Posted 10-11-2021 07:25 AM

I have seen similar solutions of this problem here but I don't quite understand them.

Did you try them? Do they work? Don't make us re-invent a solution if there already is a solution (which we could then explain to you).

Here is a solution that could be explained:

https://communities.sas.com/t5/SAS-Programming/Counting-groups-of-consecutive-characters-in-string/m...

--
Paige Miller

vanja · Posted 10-11-2021 07:41 AM

Sorry, but I can't translate the similar solutions I have found to my specific problem.

Ksharp · Posted 10-11-2021 08:36 AM

data have; 
	input id yr2010-yr2019; 
	datalines; 
		 1 . . . . . . . . 1 1
		 2 1 1 1 . . . 1 1 1 1
		 3 1 1 1 1 1 . . . 1 1
		 4 . . . . . . 1 . . .
		;

data want;
 set have;
 temp=cats(of yr:);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
 maxYears=max(maxYears,l);
 call prxnext(pid,s,e,temp,p,l);
end;
drop pid s e p l temp;
run;

vanja · Posted 10-11-2021 08:46 AM

Thank you so much for helping me. I have finished working for the day but will try it out tommorow!

Ksharp · Posted 10-12-2021 07:58 AM

I think @Astounding 's code is better.

data have; 
	input id yr2010-yr2019; 
	datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
;

data want;
 set have;
 array yrs {*} yr2010 - yr2019;
 maxyears = 0;
 count = 0;
 do _n_=1 to dim(yrs);
   if yrs{_n_} = 1 then count + 1;
   else do;      maxyears = max(maxyears, count);
     count = 0;
   end;
 end;
 maxyears = max(maxyears, count);
 drop count;
 run;

vanja · Posted 10-12-2021 08:08 AM

You were the first to answer so I thought it was fair to give ju the role as solution programmer. Would you rather that I gave that to @Astounding ? For other people that might have the same problem.

Astounding · Posted 10-11-2021 10:18 AM

A straightforward way:

data want;
 set have;
 array yrs {2010:2019} yr2010 - yr2019;
 maxyears = 0;
 count = 0;
 do _n_=2010 to 2019;
   if yrs{_n_} = 1 then count + 1;
   else do; 
     maxyears = max(maxyears, count);
     count = 0;
   end;
 end;
 maxyears = max(maxyears, count);
 drop count;
 run;

mkeintz · Posted 10-22-2021 09:51 PM

I see your array statement, and raise it by a repeat function (😉😞

data have; 
  input id yr2010-yr2019; 
datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
run; 

data want (drop=_:);
  set have;
  length _strng $10;
  _strng=cats(of yr:);
  do  LEN=length(_strng) to 1 by -1  while (find(_strng,repeat('1',LEN-1))=0);
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 10-11-2021 10:28 PM

If the absent years really are always represented with a standard missing value, then your could (1) concatenate all the year values into a character string, call it _STRNG, (2) calculate the length of each "word" in _STRNG, where a word is a substring containing no separators (no "." in this case):

data want ;
  set have;

  length _strng $10;
  _strng=cats(of yr:);
  if findc(_strng,'1') then do _w=1 to countw(_strng,'.');
    maxyears=max(maxyears,length(scan(_strng,_w)));
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 10-11-2021 11:16 PM

Similar to what others already suggested with some logic added to derive the start and stop year for the longest period.

data have; 
	input id yr2010-yr2019; 
  length str $10 word $10;
  array yrs {*} yr2010-yr2019;

  /* find longest string */
  str=cats(of yr:);
  do i=1 by 1;
    word=scan(str,i,,'kd');
    if missing(word) then leave;
    max_len=max(max_len,length(word));
  end;

  /* find start and stop variables (years) for longest period */
  start=find(str,repeat('1',max_len-1));
  stop=start+max_len-1;
  from_year =input(compress(vname(yrs[start]),,'kd'),best32.);
  to_year   =input(compress(vname(yrs[stop]),,'kd'),best32.);

  drop str word i start stop;

	datalines; 
1 . . . . . . . . 1 1
2 1 1 1 . . . 1 1 1 1
3 1 1 1 1 1 . . . 1 1
4 . . . . . . 1 . . .
;

vanja · Posted 10-12-2021 07:45 AM

Thanks for this extra info with start and end years, could turn out to be really useful for this research project.

Patrick · Posted 10-12-2021 07:49 AM

@vanja BTW: It wasn't in your sample data but what should happen if there is more than one max period? Which one would you want to select?

vanja · Posted 10-12-2021 08:13 AM

True, my real data will for sure have some observations where more periods will have same number of max years. That is a good question. I will have to look into that. Is there a way to flag them?

Patrick · Posted 10-12-2021 08:43 AM

@vanja Define how such cases should be treated.

Registration is open

SAS Training: Just a Click Away