Help with Vertical processing, Counting variable, find 2 days with lower index level

Accepted Solution Solved
Reply
Super Contributor
Posts: 371
Accepted Solution

Help with Vertical processing, Counting variable, find 2 days with lower index level

Good morning Everyone,

I have a problem with vertical processing and counting variable below. I appreciate it if you could help me out.
My data record an index level over time. So it simply has 2 variable: Time and Level.

First, for a given day (day n), I want to track the first following day (within the next 4 days: n+1 to n+4) that has level lower than this day (day n).
Or:
Day(n) level > Day(n+K)
K<=4;

So for the first observation (with level 9), the first day qualify is day 3 with level 8. For second observation (level 12), the qualify day is 3 with level 8.

For that task, my code below works (thanks for the knowledge I learn from you members)


Second, I want to track for a given day (day n) the first following 2 days that meet the following 3 conditions:

Day(n) level > Day(n+K) level > Day (n+K+Z) level
K<=4
Z<=6;

So for the first observation the 2 days qualified are: Time =3 and Time=5; for 2nd observation is: Time=3 and Time=4


I guess I need to do another round of DO but I dont know how to get the Count variable in the first DO to put it in POINT of the new DO.

Any help is very much appreciated.
Thank you,

HHC


data have;
  input time level;
  datalines;
1     9    
2     12    
3     8    
4     11  
5     7
6 9
7 16
8 7
9 20
10 1

;

data want;
set have nobs=totalobs;

found=0;

time1=0;
level1=0;

do n=_n_+1 to _n_+4 until (found=1);

set have (keep=time level rename=(time=time1 level=l1)) point=n;
if level>l1 then do;
found=1;
level1=l1;

end;
end;
run;


Accepted Solutions
Solution
‎09-13-2013 01:27 PM
Super Contributor
Posts: 339

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

  do j=mycount to mycount+4 until (found=1);

  set have (keep =time level rename=(time=time2 level=l2)) point = mycount;

needs to be replaced with point=j; since you are not icrementng mycount for that inner loop. With the way you have set yourself up, you could very well do

do j=_n_+i+1 to _n_+i+4 /* or +6? */

for your counter. I originally misread your counter issue.

If you want to clear out the error messages (that don't cause issues with the process though), you need to add handling for whenever you point= beyond the dataset size. I believe you would need to retrieve the number of rows in the dataset from sashelp.vtable or use a different end= variable for each embeded set statement and add another condition in each of your until () so that the loop stops when it reaches the corresponding end= variable

beyond that, it becomes mostly polishing output/ordering variables in the dataset and dropping the ones that are not needed anymore. Amongst other things, I would personally probably keep track of found1 and found2 instead of a single value and then after all loops, use a call missing on all variables tied to foundn=0 just so that the last set statement read is replaced by missing values when it was not found

View solution in original post


All Replies
Super Contributor
Posts: 339

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

do i=1 to 4 until (found=1);

mycounter=_N_+i;

set have (keep=time level rename=(time=time1 level=l1)) point=mycounter;

...

drop mycounter i;

should do the trick.

However, since your task is getting complicated and k+z LE 10, I would suggest you simply "horizontalize" your data a little bit and do your job in 2 steps using simple one-row processing of your conditions.

do i=1 to 10;

my counter = _N_+i;

/* I would use a macro to avoid hard coding this but...*/

select ( i );

     when (1) then set have (keep=time level rename=(time=time1 level=l1)) point=mycounter;

     when (2) then set have (keep=time level rename=(time=time2 level=l2)) point=mycounter;

     ...

     when (10) then set have (keep=time level rename=(time=time10 level=l10)) point=mycounter;

end;

end;

from there you would have today and the next 10 days over 22 columns for each row - you may need to do some handling for the last 10 records of your dataset, I don't remember if set/point returns an error or simply missing values when you attempt to point beyond the dataset end.

Super Contributor
Posts: 371

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Thanks, Vince.

However I still cannot lock the counter to the first observation that satisfy the condition using the trick you gave.

Somehow I did something wrong with mycount.

Can you have a look (especially at obs 4)?

Many thanks,

HHC

data have;
  input time level;
  datalines;
1     9    
2     12    
3     8    
4     15  
5     7
6 9
7 16
8 7
9 20
10 1

;run;

data want;
set have nobs=totalobs;

found=0;
time2=0;
time1=0;
level1=0;
level2=0;

do i=_n_+1 to _n_+4 until (found=1);

set have (keep=time level rename=(time=time1 level=l1)) point=i;

mycount=_n_+i;

if level>l1 then do;

  do j=mycount to mycount+4 until (found=1);
  set have (keep =time level rename=(time=time2 level=l2)) point = mycount;
  if level>l2 then do;


  found=1;
  level1=l1;
  level2=l2;

end;
end;
end;
end;
run;

Solution
‎09-13-2013 01:27 PM
Super Contributor
Posts: 339

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

  do j=mycount to mycount+4 until (found=1);

  set have (keep =time level rename=(time=time2 level=l2)) point = mycount;

needs to be replaced with point=j; since you are not icrementng mycount for that inner loop. With the way you have set yourself up, you could very well do

do j=_n_+i+1 to _n_+i+4 /* or +6? */

for your counter. I originally misread your counter issue.

If you want to clear out the error messages (that don't cause issues with the process though), you need to add handling for whenever you point= beyond the dataset size. I believe you would need to retrieve the number of rows in the dataset from sashelp.vtable or use a different end= variable for each embeded set statement and add another condition in each of your until () so that the loop stops when it reaches the corresponding end= variable

beyond that, it becomes mostly polishing output/ordering variables in the dataset and dropping the ones that are not needed anymore. Amongst other things, I would personally probably keep track of found1 and found2 instead of a single value and then after all loops, use a call missing on all variables tied to foundn=0 just so that the last set statement read is replaced by missing values when it was not found

Super Contributor
Posts: 371

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Thanks Vince.

I got it!.

HHC

data want;
set have nobs=totalobs;
found=0;
time2=0;
time1=0;
level1=0;
level2=0;

do i=_n_+1 to _n_+4 until (found=1);

set have (keep=time level rename=(time=time1 level=l1)) point=i;

mycount=i;

if level>l1 then do;

  do j=mycount+1 to mycount+4 until (found=1);
  set have (keep =time level rename=(time=time2 level=l2)) point = j;
  if level>l2 then do;

  found=1;
  level1=l1;
  level2=l2;

end;
end;
end;
end;
put _n_ time time1 i;
run;

Super User
Super User
Posts: 6,502

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

You are not testing for running past the end of the input file with your POINT= values.

You might want to modify the upper bound.

do j=mycount+1 to min(nobs,mycount+4) until (found=1);

  set have (keep =time level rename=(time=time2 level=l2)) point = j nobs=nobs;

Super Contributor
Posts: 371

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Thanks Tom for your suggestions.

By the way, if I don't want to use the _n_ but create a counting variable so I can control the process better, how can I make it?

I try few ways but fails.

It look like the _n_ start even before any of the DO here.

Thank you,

HHC

data want;
set have nobs=totalobs;
found=0;
time2=0;
time1=0;
level1=0;
level2=0;

COUNT=1;
do i=COUNT+1 to COUNT+4 until (found=1);

set have (keep=time level rename=(time=time1 level=l1)) point=i;

mycount=i;

if level>l1 then do;

  do j=mycount+1 to min(nobs,mycount+4) until (found=1);
  set have (keep =time level rename=(time=time2 level=l2)) point = j nobs=nobs;
  if level>l2 then do;

   found=1;
   level1=l1;
   level2=l2;end;
end;
end;
COUNT=COUNT+1;
end;

run;

Super User
Super User
Posts: 6,502

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Put it outside the do loop and it will operate like the automatic _N_ variable.  You can use the sum statement and it will automatically be retained and initialized to zero.

data want;

  set have nobs=totalobs;

  COUNT+1;

  ...

Super Contributor
Posts: 339

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Yet again, I learn the simplest extremely useful things from you, Tom. I had always considered

x+1;

to be a simple syntax reduction of

x=x+1;

without the retain statement effect and used to stay away from it because it doesn't speak to me as do += and =+ C language operators so I would keep the long syntax for readability.

Glad to pickup an additionnal trick

Thanks

Super Contributor
Posts: 371

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Thanks guys for your help.
I recently have to use SAS for my work and still finding it hard to use this new language, especially controlling the position for the iteration.

One of the example of my problem is that:
For the data day and level I have below (the same as the above discussion), if the first day having level of more than 3   (day(i)), I give this day code special. Then, jump the analysis to the new day = day(i)+ level(i) to do the analysis again.
I dont know how to make the JUMP in SAS.
Can you help me please?
Thank you very much,
HHC


The output should look like that:

day level__________  special
1      2    __________(level less than 3, move to the next)    
2      4  __________   1  (new day to restart analysis is 2+4=6)    
3      10    
4      9  
5      2
6 5       __________1  (new day to restart analysis is 6+5=11)
7 160
8 10
9 5
10 15
11 2   __________    (level less than 3, move to the next)
12 4
13 5 __________    1   
;run;


data have;
  input day level;
  datalines;

1      2      

2      4

3      10    

4      9  

5      2

6 5

7 160

8 10

9 5

10 15

11 2

12 4

13 5

;run;

data want;
set have nobs=totalobs;
special=0;

COUNT+1;

do i=COUNT to totalobs until (found=1);
if level>3 then do;
found=1;
special=1;
i=COUNT+level;

end;
end;
run;

Super Contributor
Posts: 339

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

The reason why it is so difficult to control the loop position is because SAS runs almost everything on disc so as to be able to manage extremely large sets of data without memory failures. Thus, even if you controlled the loop yourself (which can be done with a do statement), you still find yourself in a situation where your full data vector gets replaced at the set statement for each subsequent iteration.

However, your 2nd problem posted here can be trivialized with a retain statement and without any do iteration (although some do blocks). You simply add a condition on your carried-over "next" counter.

data want;

     set have;

     if _N_=1 then next=1;

     if next<=day then do;

          if level<4 then next=next+1;

          else do;

               next=next+level;

               special=1;

          end;

     end;

     retain next;

     /* drop next; */

run;

It doesn't do exactly the output you've examplified but I've assumed you had an error and day 12 should've been special, not day 13.

So in essence, since you are not really doing multi-row processing but only accounting for what next row to process based on some conditionning of exactly one current row, you can create a new variable that tracks what record to process and just skip over others yet output them. Drop the pointer variable once you are satisfied with the results - I left it in comments so that you can see the underlying logic in the output dataset

Just for the record, if you want full control over your do loop (_N_ will not be useable but you will have another variable to use), here's a generic self-control over loop syntax.

data want;

     do i=1 to totalobs;

          /*some statements*/

          set have nobs=totalobs;

          /* other statements */

          output; /* data step automatic iteration also has an implicit output; statement at the end *unless* there are output statements elsewhere in the code */

     end;

     drop i; /* i acts as your new _N_ inside the do loop */

run;

It's not recommended unless there are no other ways around though but it is an approach that is useable.

Vince

Super Contributor
Posts: 371

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Thanks Vince.

Your help clears a lot of thing in my mind.

One question about your code:

when we write :

if _N_=1 then next=1;

What is the value of Next in other observations at the beginning (before retained kicked in)?

Why it does not work if I simply use: Next=1. In this case all observation has Next=1. But after the 2 observation, Next is retained at 6. However, it looks like SAS does not take this value 6 to update Next.

Thanks as always.

HHC

Super Contributor
Posts: 339

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Hi HHC,

retain statement doesn't "kick in" each time it's read sequentially. I don't know exactly the appropriate terminology but there are statements that are read once at compilation and are active throughout the entire data step processing. Similarly as an example, the drop statement doesn't drop variables at each iteration but only at the very end of the entire data step. SAS 9.4 data step 2.0 syntax gives a better insight on this by spliting initiation/loop ("run")/end

So before going in, your data vector has all values set as missing. then next gets initiated to 1, then the data vector is filled with row information when processing the set statement. Some more processing occurs (the conditioning and whatnot). When you reach the end of this data step loop (and all subsequent others), when the data vector would normally be reset to all values missing, instead, the value stored in next gets retained. That is, your data vector is reset to all missing values excepted next which is carried through.

It won't work if you only use statement next=1 because then at each iteration you would be reseting your counter. So your next variable would go change like this

_N_     Next(start of loop)     Next(after next=1 statement) Next(end of loop)

1     .     1     2

2     2     1     5

3     5     1     11

4     11     1     10

etc.

In essence, it comes back to the explanation above. The value of the previous row does not get substituted in at the retain statement but just does not get reseted to missing at each data step loop iteration so a next=1 statement with no condition on _N_ will reset your counter at each iteration.

Vince

Super Contributor
Posts: 371

Re: Help with Vertical processing, Counting variable, find 2 days with lower index level

Thanks Vince for you lecture!

HHC

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 755 views
  • 7 likes
  • 3 in conversation