BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LEINAARE
Obsidian | Level 7

Hello,

I am trying to assign values of variables from previous observations to new variables in current observations within unique ID groupings using the LAG function.  To prevent assigning the last value of previous unique IDs to the first observation for subsequent IDs, I used a SET statement with BY group processing to enable use of FIRST.ID logic to leave values for the first observation of each unique ID missing.  However, I am getting unexpected results.

 

For the first unique ID, I am getting missing values returned by the LAG function for the first two observations rather than just the first.  For subsequent ID numbers, I am getting the lagged value of the last observation of the previous ID assigned to the second observation of the current ID.  Below is a sample code I created.

 

data test;
	input ID Var1 $ Var2;
	datalines;
	1 A 0
	1 A 1
	1 A 1
	1 A 0
	1 A 0
	2 B 0
	2 B 0
	2 B 0
	3 C .
	3 C 1
	3 C 1
	3 C 0
	;
run;

data test_mod;
	set test;
	by ID;
*Want values for first obs missing;
*Want values of 2nd obs to have value of 1st obs within each ID group; if not first.ID then do; Prev_Var1 = lag(Var1); Prev_Var2 = lag(Var2); end; run;

 

 

The dataset Test_Mod looks like this.

 

ID     Var1     Var2     Prev_Var1     Prev_Var2

1         A            0                                    .

1         A            1                                    .

1         A            1               A                   1

1         A            0               A                   1

1         A            0               A                   0

2         B            0                                    .

2         B            0               A                   0

2         B            0               B                   0

3         C            .                                     .

3         C            1               B                   0

3         C            1               C                   1

3         C            0               C                   1

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Very common issue when using the LAG() function.

Do not use it in a conditional branch; it only feeds values to the internal queue when it is called, so it won't pick up a new value in the first observation of a group.

Restructure your code:

data test_mod;
set test;
by ID;
Prev_Var1 = lag(Var1);
Prev_Var2 = lag(Var2);
if first.ID then do;
  Prev_Var1 = .;
  Prev_Var2 = .;
end;
run;

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

Very common issue when using the LAG() function.

Do not use it in a conditional branch; it only feeds values to the internal queue when it is called, so it won't pick up a new value in the first observation of a group.

Restructure your code:

data test_mod;
set test;
by ID;
Prev_Var1 = lag(Var1);
Prev_Var2 = lag(Var2);
if first.ID then do;
  Prev_Var1 = .;
  Prev_Var2 = .;
end;
run;
LEINAARE
Obsidian | Level 7

Hi @Kurt_Bremser ,

 

Thanks for the help.  That makes sense.

PeterClemmensen
Tourmaline | Level 20

Alternatively, you can do this. This works because the Ifn/Ifc functions execute the Lag Function in the third argument regardless of whether the condition is true or not. This way, the queue/dequeue mechanism behind the scenes does not skip any values. 

 

data test;
	input ID Var1 $ Var2;
	datalines;
	1 A 0
	1 A 1
	1 A 1
	1 A 0
	1 A 0
	2 B 0
	2 B 0
	2 B 0
	3 C .
	3 C 1
	3 C 1
	3 C 0
	;
run;

data test_mod;
	set test;
	by ID;
    Prev_Var1 = ifc(first.id, ' ', lag(Var1));
    Prev_Var2 = ifn(first.id, .  , lag(Var2));
run;

 

LEINAARE
Obsidian | Level 7

Hi @PeterClemmensen ,

 

Thanks for the suggestion.  I was not familiar with those functions.  So it is good to learn about them!

Kurt_Bremser
Super User

These functions are a boon for situations like yours, as they make sure that both branches are executed, while the result of only one is kept. Being a very traditional programmer myself, I always forget to mention them 😉

ballardw
Super User

@Kurt_Bremser wrote:

These functions are a boon for situations like yours, as they make sure that both branches are executed, while the result of only one is kept. Being a very traditional programmer myself, I always forget to mention them 😉


Partially the same about "traditional" but I find the IFC/IFN to look ugly, especially if nested, and hard to read quickly. Maybe just lack of practice but I used such that look like than in spreadsheets and constantly cringed while typing such formulae.

PeterClemmensen
Tourmaline | Level 20

@LEINAARE Anytime. I learned this myself recently by the great @mkeintz .

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2205 views
  • 2 likes
  • 4 in conversation