Solved: Re: Do Loop to count number 1 and -1 and count have sign +/-

hhchenfx · Posted 11-08-2023 05:19 PM

Hi Everyone,

My data has id and value.

For each row[i], I want to check the window from [i] to [i+5]. I want to find the first non-missing value, say 1, and count how many 1 before -1 shows up.

If the first non-missing value is -1, I want to count how many -1 before 1 show up.

To distinguish between 1 series and -1 series, the count should have + sign for 1 and - sign for -1.

After a lot of work, I get the code run correctly but Separately for +1 and -1 and then combine the 2 counts.

The logic is simply counting and very straight forward as you can see.

My issue is how to combine the 2 parts in 1 do loop.

Can you please help?

Many thanks,

HHC

data have;
  input id value;
  datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 .
1 .
1 .
1 .
1 .
1 -1
1 .
1 -1
1 .
1 .
1 1
1 1
2 -1
2 .
2 1
2 .
2 1
;
/*Working on positive value*/
data want; 
set have nobs=nobs;
drop i j_: out_:;
out_i=0;
pos_count=0;
i+1;
DO j=i to i+5 until (j=nobs or out_j=1);
	set have (keep = id value rename=(id=j_id value=j_value)) point=j;
		if id^=j_id then leave;
		if j_value=1  	then pos_count=pos_count+1;
		else
		if j_value=-1	then leave;
END;
run;

/*Working on Negative value*/
data want; 
set want nobs=nobs;
drop i j_: out_:;
out_i=0;
neg_count=0;
i+1;
DO j=i to i+5 until (j=nobs or out_j=1);
	set want (keep = id value rename=(id=j_id value=j_value)) point=j;
		if id^=j_id then leave;
		if j_value=-1  	then neg_count=neg_count-1;
		else
		if j_value=1	then leave;
END;
run;

/*Combine*/
data want; set want;
final_count =  pos_count+neg_count;
run;

hhchenfx · Posted 11-09-2023 12:00 AM

Finally, I got it!!!

The 2 issues I fixed which is quite difficult to see is:

1- To get data for all rows, I need a quite "stupid" and "redundant" step: Do j=i;

2- Once value=1 is found, put an out statement so that the outer loop will not run again (to avoid the double/triple counts)

If you could, please help to make my code more efficient or suggest other solution.

Many many thanks.

HHC


data want; 
set have nobs=nobs;
drop i j k l k_: l_:  out_:;
i+1;

DO J=i;
	out_k=0;
	count=0;
	Do k=j to j+5 until (k=nobs or out_k=1);
	set have (keep = id value rename=(id=k_id value=k_value)) point=k;

		IF id^=k_id then leave;

		/*positive section----*/
		ELSE
		IF k_value=1  	then do; 
			out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
			DO l=k to j+5 until (l=nobs);
			set have (keep = id value rename=(id=l_id value=l_value)) point=l;
				if id^=l_id then leave;
				else
				if l_value>0 then count=count+1;
				else
				if l_value=-1	then leave;
			End;
			End;

		/*negative section----*/
		ELSE
		IF k_value=-1  	then do; 
			out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
			DO l=k to j+5 until (l=nobs);
			set have (keep = id value rename=(id=l_id value=l_value)) point=l;
				if id^=l_id then leave;
				else
				if l_value=-1 then count=count-1;
				else
				if l_value=1	then leave;
			End;
			End;
	end;
END;
run;

View solution in original post

hhchenfx · Posted 11-09-2023 12:00 AM

Finally, I got it!!!

The 2 issues I fixed which is quite difficult to see is:

1- To get data for all rows, I need a quite "stupid" and "redundant" step: Do j=i;

2- Once value=1 is found, put an out statement so that the outer loop will not run again (to avoid the double/triple counts)

If you could, please help to make my code more efficient or suggest other solution.

Many many thanks.

HHC


data want; 
set have nobs=nobs;
drop i j k l k_: l_:  out_:;
i+1;

DO J=i;
	out_k=0;
	count=0;
	Do k=j to j+5 until (k=nobs or out_k=1);
	set have (keep = id value rename=(id=k_id value=k_value)) point=k;

		IF id^=k_id then leave;

		/*positive section----*/
		ELSE
		IF k_value=1  	then do; 
			out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
			DO l=k to j+5 until (l=nobs);
			set have (keep = id value rename=(id=l_id value=l_value)) point=l;
				if id^=l_id then leave;
				else
				if l_value>0 then count=count+1;
				else
				if l_value=-1	then leave;
			End;
			End;

		/*negative section----*/
		ELSE
		IF k_value=-1  	then do; 
			out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
			DO l=k to j+5 until (l=nobs);
			set have (keep = id value rename=(id=l_id value=l_value)) point=l;
				if id^=l_id then leave;
				else
				if l_value=-1 then count=count-1;
				else
				if l_value=1	then leave;
			End;
			End;
	end;
END;
run;

mkeintz · Posted 11-09-2023 12:14 AM

I'm posting the same response I gave to your virtually identical topic. It does not require separate sets of codes for -1 vs +1:

data have;
  input id value @@;
  datalines;
1 1    1 .    1 1    1 .     1 1    1 .     1 .    1 .    
1 .    1 .    1 .    1 -1    1 .    1 -1    1 .    1 .    
1 1    1 1    
2 -1    2 .    2 1    2 .    2 1    
;

data want (drop=_: i);
  merge have (firstobs=1)
        have (firstobs=2 keep=id value rename=(id=_id2 value=_val2))
        have (firstobs=3 keep=id value rename=(id=_id3 value=_val3))
        have (firstobs=4 keep=id value rename=(id=_id4 value=_val4))
        have (firstobs=5 keep=id value rename=(id=_id5 value=_val5))
        have (firstobs=6 keep=id value rename=(id=_id6 value=_val6)) ;

  array _x {*} id     _id:  ;
  array _v {*} value  _val: ;

  count=0;
  do i=1 to dim(_x) while(id=_x{i});
    if _nonmiss1=. then _nonmiss1=_v{i};
    else if _v{i}=. then continue;
    else if _v{i}=_nonmiss1 then count=count+sign(_nonmiss1);
    else if sign(_v{i})=-1*sign(_nonmiss1) then leave;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hhchenfx · Posted 11-09-2023 12:24 AM

Hi mkeitz,

I run your code and for row 4 and 5, the count is 0. They suppose to be 1.

Since I will have multiple windows to check, such as 5, 50 or 60. I am not sure if this code is easy to change.

Thank you so much for helping.

HHC

Tom · Posted 11-09-2023 03:58 PM

Use the POINT= option to "lookahead".

You will need to know how many observations there are for each ID to make sure you don't look TOO far.

So given this data:

data have;
  input id value @@;
datalines;
1 1    1 .    1 1    1 .     1 1    1 .     1 .    1 .    
1 .    1 .    1 .    1 -1    1 .    1 -1    1 .    1 .    
1 1    1 1    
2 -1    2 .    2 1    2 .    2 1    
;

With a window of 5 observations (counting the current non-missing value).

%let window=5;
data want;
  length id value nfound firstobs nobs 8;
  keep id -- nobs;
  do nobs=1 by 1 until(last.id);
    set have;
    by id;
  end;
  do row=1 by 1 until(last.id);
    set have;
    by id;
    obsnum+1;
    if row=1 then firstobs=obsnum;
    if not missing(value) then do;
       nfound=1;
       do p=obsnum+1 to min(obsnum+&window-1,firstobs+nobs-1) until(newvalue=-value);
         set have(keep=value rename=(value=newvalue)) point=p;
         nfound + (value=newvalue);
       end;
       output;
    end;
  end;
run;

You get these counts.

Obs    id    value    nfound    firstobs    nobs

  1     1       1        3          1        18
  2     1       1        2          1        18
  3     1       1        1          1        18
  4     1      -1        2          1        18
  5     1      -1        1          1        18
  6     1       1        2          1        18
  7     1       1        1          1        18
  8     2      -1        1         19         5
  9     2       1        2         19         5
 10     2       1        1         19         5

hhchenfx · Posted 11-09-2023 04:29 PM

Thank you as always, Tom.

HHC

ahuige · Posted 11-12-2023 12:13 AM

Have not reviewed/replied SAS programming threads for fun in recent years. This one is an interesting fun topic to practice all basic DATA step statement elements.

data have;
  input id value;
  datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 .
1 .
1 .
1 .
1 .
1 -1
1 .
1 -1
1 .
1 .
1 1
1 1
2 -1
2 .
2 1
2 .
2 1
;

data ahuige(keep=id value count); 
  do obsnum=1 to last;
     set have nobs=last;
     count=sum(0,sign(value));
     over=0;
     do point=obsnum+1 to obsnum+5;
     if point<=last and not over then 
       do;
       set have(rename=(id=nowId value=nowValue))  point=point; 
       over=((id ne nowId) or sign(NowValue)*sign(count)=-1 );
       if not over then count=sum(count,sign(NowValue)) ; 
       end;        
     end;
     output;
  end;stop;
run;
proc print;run;;

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away