BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zcoop
Fluorite | Level 6

Hi! I'm a first time poster, but 10 minutes of online searching didn't yield a similar result, so I apologize if I just failed at finding a similar, past post w/ a solution.

 

I have a dataset called 'untaggedData' that is comprised of two columns: 'id' and 'year'. I've sorted it by id first and then by year (descending). Most id's have multiple years.

 

My goal is to label the oldest year (bottom row) for any given id where the label tells me whether or not a >= 2 year difference occurs in any consecutive years for that given id.

 

While I have years of coding experience in other languages like Python and VBA, I'm new to SAS and its many unfamiliar mannerisms. The following code seemed like a line-efficient way to accomplish my goal, but I'm discovering that mixing the lag() function with IF-THEN statements is not intuitive.

data taggedData;
	set work.untaggedData;
	category = "A";
	if lag(id) = id and lag(year) - year >= 2 then category = "B";
	if lag(id) = id and lag(category) = "B" then category = "B";
run;

Here's a suitable mock-up of the result my inadequate code:

IDYEARCATEGORY
12022A
12021A
12006B
12006B
12005A
12003B
12003B
12003A
12002A
22022A
22021A
.........

So I don't understand why SAS is not labeling all subsequent rows (per ID) with "B" after it occurs the first time?

 

Though my primary question is how to achieve my goal? How would you go about labeling the oldest year (bottom row) for any given id so that it says whether or not a >= 2 year difference occurs in any consecutive years for that given id?

 

Thanks,

Zach

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The LAG() function sets up a queue, where an element is taken from the "top" and a new one inserted at the "bottom" whenever the function is called. So, calling the function conditionally is usually a bad idea, with confusing results.

But you call it unconditionally, which is good.

But at the time you call it for lag(category), category will be "A" if the first IF was not true, and that "A" goes into the queue.

Use RETAIN and BY instead.

data taggedData;
set work.untaggedData;
by id;
retain category;
if first.id then category = "A";
if not first.id and lag(year) - year >= 2 then category = "B";
run;

 

View solution in original post

1 REPLY 1
Kurt_Bremser
Super User

The LAG() function sets up a queue, where an element is taken from the "top" and a new one inserted at the "bottom" whenever the function is called. So, calling the function conditionally is usually a bad idea, with confusing results.

But you call it unconditionally, which is good.

But at the time you call it for lag(category), category will be "A" if the first IF was not true, and that "A" goes into the queue.

Use RETAIN and BY instead.

data taggedData;
set work.untaggedData;
by id;
retain category;
if first.id then category = "A";
if not first.id and lag(year) - year >= 2 then category = "B";
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 651 views
  • 1 like
  • 2 in conversation