BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zcoop
Fluorite | Level 6

Hi! I'm a first time poster, but 10 minutes of online searching didn't yield a similar result, so I apologize if I just failed at finding a similar, past post w/ a solution.

 

I have a dataset called 'untaggedData' that is comprised of two columns: 'id' and 'year'. I've sorted it by id first and then by year (descending). Most id's have multiple years.

 

My goal is to label the oldest year (bottom row) for any given id where the label tells me whether or not a >= 2 year difference occurs in any consecutive years for that given id.

 

While I have years of coding experience in other languages like Python and VBA, I'm new to SAS and its many unfamiliar mannerisms. The following code seemed like a line-efficient way to accomplish my goal, but I'm discovering that mixing the lag() function with IF-THEN statements is not intuitive.

data taggedData;
	set work.untaggedData;
	category = "A";
	if lag(id) = id and lag(year) - year >= 2 then category = "B";
	if lag(id) = id and lag(category) = "B" then category = "B";
run;

Here's a suitable mock-up of the result my inadequate code:

IDYEARCATEGORY
12022A
12021A
12006B
12006B
12005A
12003B
12003B
12003A
12002A
22022A
22021A
.........

So I don't understand why SAS is not labeling all subsequent rows (per ID) with "B" after it occurs the first time?

 

Though my primary question is how to achieve my goal? How would you go about labeling the oldest year (bottom row) for any given id so that it says whether or not a >= 2 year difference occurs in any consecutive years for that given id?

 

Thanks,

Zach

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The LAG() function sets up a queue, where an element is taken from the "top" and a new one inserted at the "bottom" whenever the function is called. So, calling the function conditionally is usually a bad idea, with confusing results.

But you call it unconditionally, which is good.

But at the time you call it for lag(category), category will be "A" if the first IF was not true, and that "A" goes into the queue.

Use RETAIN and BY instead.

data taggedData;
set work.untaggedData;
by id;
retain category;
if first.id then category = "A";
if not first.id and lag(year) - year >= 2 then category = "B";
run;

 

View solution in original post

1 REPLY 1
Kurt_Bremser
Super User

The LAG() function sets up a queue, where an element is taken from the "top" and a new one inserted at the "bottom" whenever the function is called. So, calling the function conditionally is usually a bad idea, with confusing results.

But you call it unconditionally, which is good.

But at the time you call it for lag(category), category will be "A" if the first IF was not true, and that "A" goes into the queue.

Use RETAIN and BY instead.

data taggedData;
set work.untaggedData;
by id;
retain category;
if first.id then category = "A";
if not first.id and lag(year) - year >= 2 then category = "B";
run;

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 710 views
  • 1 like
  • 2 in conversation