BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Diemo
Calcite | Level 5

So, I have a dataset

DATA CLASS;
INPUT ONE 1-2 TWO 3-4 COUNT 5-20;
datalines;
0 0 30
1 0 541
2 0 1525
3 0 5
0 1 3
1 1 229
2 1 642
3 1 3
;
run;

And I want to convert count to a Lower bound and an upper bound, using some pretty simple rules:

if (ONE=0) then LB=0

else LB=LAG(UB)+1

UB=LB+count -1

 

so, in SAS I programmed this:

data class2;

set class;

LB=LAG(UB)+1;
if (ONE=0) then LB=0;

UB =LB+count;

run;

 

But this just gives me

 

ONE TWO COUNT LB UB LB2
0030030.
10541...
201525...
305...
01303.
11229...
21642...
313..

.

 

what is going on?

 

 

 
 
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@Diemo wrote:

What I expect to end up with is a data set as follows:

ONE TWO COUNT LB UB

0 0 30 0 30  
1 0 541 31 572  
2 0 1525 573 2098  
3 0 5 2099 2104  
0 1 3 0 3  
1 1 229 4 331  
2 1 642 332 974  
3 1 3 975 978

 

 

I was under the impression that the lines were sent in sequentially - i.e. SAS would do line 1, then Line2 , then Line3, etc.  So, on my first iteration, LAG(UB) returns missing (as UB is all missing). But ONE==0 so therefore LB=0, then UB=LB(0) +count = 30 (which is what I get. On the second iteration, LB=LAG(UB)=30, and UB is then 31+541 = 572. And so on.

 

 And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

I guess that this is where the issue is.   Why is it that LAG is not updating to the last value but keeping the missing values, even though the value of UB is changed?


The statements execute in order and do what they are designed to do.  The LAG() function call on the first assignment statement cannot look into the future and know what value is going to be assigning to the UB variable two statements later!!  LAG() just returns the value that the variable had the last time that specific LAG() function call ran.

 

You don't want LAG() you just want to RETAIN the value from the previous observation.

data class2;
  set class;
  lb=ub+1;
  if (ONE=0) then do;
    LB=0; ub=0;
  end;
  UB+count;
run;
Obs    ONE    TWO    COUNT      lb      ub

 1      0      0        30       0      30
 2      1      0       541      31     571
 3      2      0      1525     572    2096
 4      3      0         5    2097    2101
 5      0      1         3       0       3
 6      1      1       229       4     232
 7      2      1       642     233     874
 8      3      1         3     875     877

View solution in original post

8 REPLIES 8
Reeza
Super User

Code doesn't match the output, for example there's no LB2 mentioned anywhere in your code or source data.

 

So what do you have, and what do you expect and the exact code you tried please. 

 


@Diemo wrote:

So, I have a dataset

DATA CLASS;
INPUT ONE 1-2 TWO 3-4 COUNT 5-20;
datalines;
0 0 30
1 0 541
2 0 1525
3 0 5
0 1 3
1 1 229
2 1 642
3 1 3
;
run;

And I want to convert count to a Lower bound and an upper bound, using some pretty simple rules:

if (ONE=0) then LB=0

else LB=LAG(UB)+1

UB=LB+count -1

 

so, in SAS I programmed this:

data class2;

set class;

LB=LAG(UB)+1;
if (ONE=0) then LB=0;

UB =LB+count;

run;

 

But this just gives me

 

ONE TWO COUNT LB UB LB2
0 0 30 0 30 .
1 0 541 . . .
2 0 1525 . . .
3 0 5 . . .
0 1 3 0 3 .
1 1 229 . . .
2 1 642 . . .
3 1 3 . .

.

 

what is going on?

 

 

 
 

 

Diemo
Calcite | Level 5

Yeah you can ignore LB2. It is just a check to see what LAG(UB) was actually giving me (all missing values).

 

data class2;

set class;

LB=LAG(UB)+1;

LB2=LAG(UB);

if (ONE=0) then LB=0;

UB =LB+count;

run;

Tom
Super User Tom
Super User

And what did you expect those statements to do?  Let's look at them.  First you make a new data step that reads from your original dataset.

data class2;
  set class;

Then in the next statement.

LB=LAG(UB)+1;

You reference two new variables,  LB and UB.  Since LB and UB were not defined before SAS will make them numeric and give them missing values.  The LAG() function will push the current value onto a stack and return the previous value from the stack. On the first iteration the returned value will be missing since nothing has yet been pushed down. And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

 

Now on the next statement 

if (ONE=0) then LB=0;

you conditionally set LB to zero.  So on any observations where ONE = 1 then LB will be 0.

Then

UB =LB+count;

you add the variable COUNT to the variable LB and store it into UB.  But since LB is missing for any observation where ONE is not equal to 1 this will force LB to be missing also.

 

The SAS log should show you where you are generating missing values.

28   data class2;
29     set class;
30     LB=LAG(UB)+1;
31     if (ONE=0) then LB=0;
32     UB =LB+count;
33   run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      8 at 30:13   6 at 32:9
NOTE: There were 8 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.CLASS2 has 8 observations and 5 variables.

What are you trying to find? The min() and max() of count?  The SUM() of count?  

Please show what your desired output should be?

 

Diemo
Calcite | Level 5

What I expect to end up with is a data set as follows:

ONE TWO COUNT LB UB

0030030 
1054131572 
2015255732098 
30520992104 
01303 
112294331 
21642332974 
313975978

 

 

I was under the impression that the lines were sent in sequentially - i.e. SAS would do line 1, then Line2 , then Line3, etc.  So, on my first iteration, LAG(UB) returns missing (as UB is all missing). But ONE==0 so therefore LB=0, then UB=LB(0) +count = 30 (which is what I get. On the second iteration, LB=LAG(UB)=30, and UB is then 31+541 = 572. And so on.

 

 And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

I guess that this is where the issue is.   Why is it that LAG is not updating to the last value but keeping the missing values, even though the value of UB is changed?

Tom
Super User Tom
Super User

@Diemo wrote:

What I expect to end up with is a data set as follows:

ONE TWO COUNT LB UB

0 0 30 0 30  
1 0 541 31 572  
2 0 1525 573 2098  
3 0 5 2099 2104  
0 1 3 0 3  
1 1 229 4 331  
2 1 642 332 974  
3 1 3 975 978

 

 

I was under the impression that the lines were sent in sequentially - i.e. SAS would do line 1, then Line2 , then Line3, etc.  So, on my first iteration, LAG(UB) returns missing (as UB is all missing). But ONE==0 so therefore LB=0, then UB=LB(0) +count = 30 (which is what I get. On the second iteration, LB=LAG(UB)=30, and UB is then 31+541 = 572. And so on.

 

 And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

I guess that this is where the issue is.   Why is it that LAG is not updating to the last value but keeping the missing values, even though the value of UB is changed?


The statements execute in order and do what they are designed to do.  The LAG() function call on the first assignment statement cannot look into the future and know what value is going to be assigning to the UB variable two statements later!!  LAG() just returns the value that the variable had the last time that specific LAG() function call ran.

 

You don't want LAG() you just want to RETAIN the value from the previous observation.

data class2;
  set class;
  lb=ub+1;
  if (ONE=0) then do;
    LB=0; ub=0;
  end;
  UB+count;
run;
Obs    ONE    TWO    COUNT      lb      ub

 1      0      0        30       0      30
 2      1      0       541      31     571
 3      2      0      1525     572    2096
 4      3      0         5    2097    2101
 5      0      1         3       0       3
 6      1      1       229       4     232
 7      2      1       642     233     874
 8      3      1         3     875     877
Diemo
Calcite | Level 5
Hmm, ok. So what does LAG() actually do (I was under the impression that LAG() was how you retained a value for future processing, but that seems to be incorrect.
Tom
Super User Tom
Super User

Read the description I wrote before and read the manual.

 

It returns the previous value that it stored the last time it executed.  It has nothing to do with previous observations.  Just previous times that you executed the function. And if you have more than one LAG() function in the same dataset step they each keep their own stack of values, even if they are referencing (storing/retrieving) the same variable.

 

Consider it like a stack of trays in a cafeteria.  When you walk in you take the top tray off the stack and place the tray you came in with onto the stack.  Then the next time you do the same thing so you take out the tray you put in the previous time and put in the tray you currently have.

Reeza
Super User

@Diemo wrote:
Hmm, ok. So what does LAG() actually do (I was under the impression that LAG() was how you retained a value for future processing, but that seems to be incorrect.

That would be RETAIN

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2669 views
  • 0 likes
  • 3 in conversation