turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Lag function not working with by clause

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:07 AM

I've the data below.

Name | Address | Zip | Prod |

John | 110,KX Colony | 11198 | PersonalLoan |

Kiran | 7/189 Rajaji Road | 11094 | Mortagge |

Kiran | 7/189 Rajaji Road | 11094 | Creditcard |

Amar | 16,Mt.batten Street | 12409 | Mortagage |

Raju | 11,Kannan nagar | 12546 | Home Loan |

Raju | 11,Kannan nagar | 12546 | PersonalLoan |

Raju | 11,Kannan nagar | 12546 | Mortgage |

My desired output is as follows.

Name | Address | Zip | Prod | Prod_number |

John | 110,KX Colony | 11198 | PersonalLoan | 1 |

Kiran | 7/189 Rajaji Road | 11094 | Mortagge | 1 |

Kiran | 7/189 Rajaji Road | 11094 | Creditcard | 2 |

Amar | 16,Mt.batten Street | 12409 | Mortagage | 1 |

Raju | 11,Kannan nagar | 12546 | Home Loan | 1 |

Raju | 11,Kannan nagar | 12546 | PersonalLoan | 2 |

Raju | 11,Kannan nagar | 12546 | Mortgage | 3 |

When I ran the code below, I'm not getting the desired output. Shouldn't we use lag function with by clause?

data new_hhlevel_prod;

set hhlevel_prod;

by name ADDRESS zip notsorted;

if prod=lag(prod) then prod_number=1;

else prod_number+1;

run;

Please suggest me to get the desired output.

Accepted Solutions

Solution

05-20-2015
10:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 10:52 AM

If you are using BY then you do not need to use LAG() to check for changes. SAS will have already done that and set the FIRST. and LAST. variables to indicate when values change.

So let's assume that NAME,ADDRESS,ZIP define a customer and you want to count the distinct products for that customer.

There is some risk in using the NOTSORTED option because it might be possible that the records for one customer are separated by the records for another customer. In that case you could end up with two different values of PROD for that same customer getting PROD_NUMBER set to one.

**data** want ;

set have ;

by name address zip prod notsorted;

if first.zip then prod_number=**1**;

else prod_number+first.prod;

**run**;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:15 AM

BY statement processing and LAG function are unrelated - see using the FIRST.<break_variable> instead.

Also, from your data example, the ADDRESS-related attribute is influencing PROD_NUM data-value counting, correct?

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:26 AM

First, I would not use notsorted, instead I would sort by all the by criteria already used and by prod(!), to avoid the same product being counted twice if it is not in consecutive observations.

Then do the following:

data new_hhlevel_prod;

set hhlevel_prod;

by name ADDRESS zip;

oldprod = lag(prod);

if first.zip then do;

prod_number = 0;

oldprod = '';

end;

if prod ne oldprod then prod_number+1;

run;

(Assuming you wanted to get a count of the number of distinct prods per customer)

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:32 AM

LAG should not be necessary, given that FIRST.<break_varname> will detect the condition explained as: "current value equal to prior observation's value" based on the BY variable list shown.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:38 AM

May I question why you will not use 'not sorted' option as you can skip one proc step if you use this option?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:43 AM

The presumption is your data does appear to be sorted and you need not use PROD in your BY list -- again, the use of BY and LAG are unrelated.

Suggestion would be to read up on the DATA step technique related to the DATA step elements being used.

Also, suggest you might add to your DATA step some PUTLOG "I am here" _ALL_; statements for desk-checking with your DATA step operational execution processing to learn more about the BY statement use and how it is exploited and for what purposes -- that being to interrogate FIRST. and LAST. "break variable" conditions (those variables on the BY statement).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:45 AM

Imagine your file looks like that:

customer prod

a x

a y

b x

a y

the notsorted option would give you two counts for customer a, one where (a,y) had 1, and one where (a,y) has 2. According to my view of your task, you would want both (a,y) record to have the count of 2.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:55 AM

Apologies, I missed this point. There are duplicates by name, address, zip and product. In that case I no need to increase the count of the new variable prod_number. It should be displaying as 1.

data have:

Name | Address | Zip | Prod |

John | 110,KX Colony | 11198 | PersonalLoan |

John | 110,KX Colony | 11198 | PersonalLoan |

Kiran | 7/189 Rajaji Road | 11094 | Mortagge |

Kiran | 7/189 Rajaji Road | 11094 | Creditcard |

Amar | 16,Mt.batten Street | 12409 | Mortagage |

Raju | 11,Kannan nagar | 12546 | Home Loan |

Raju | 11,Kannan nagar | 12546 | PersonalLoan |

Raju | 11,Kannan nagar | 12546 | Mortgage |

Raju | 11,Kannan nagar | 12546 | Mortgage |

data want:

Name | Address | Zip | Prod | Prod_number |

John | 110,KX Colony | 11198 | PersonalLoan | 1 |

John | 110,KX Colony | 11198 | PersonalLoan | 1 |

Kiran | 7/189 Rajaji Road | 11094 | Mortagge | 1 |

Kiran | 7/189 Rajaji Road | 11094 | Creditcard | 2 |

Amar | 16,Mt.batten Street | 12409 | Mortagage | 1 |

Raju | 11,Kannan nagar | 12546 | Home Loan | 1 |

Raju | 11,Kannan nagar | 12546 | PersonalLoan | 2 |

Raju | 11,Kannan nagar | 12546 | Mortgage | 3 |

Raju | 11,Kannan nagar | 12546 | Mortgage | 3 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 08:00 AM

Your most effective/efficient approach is not to use LAG but focus on using / exploiting your BY statement variable list and the accompanying FIRST.<break_varname> DATA step IF/THEN/ELSE coding technique. Have a look at the SAS Language Elements for relevant information and examples. And yes LAG is another example, but it has no relevance to also using BY statement coding technique.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 08:22 AM

I got my desired output with this code. Can we still make it simple with one if clause?

data have;

infile cards dsd dlm=',';

input Name $ Address $ Zip Prod $;

length address prod $25.;

cards;

John,110KX Colony,11198,PersonalLoan

Kiran,7/189 Rajaji Road,11094,Mortagge

Kiran,7/189 Rajaji Road,11094,Creditcard

Amar,16Mt.batten Street,12409,Mortagage

Amar,16Mt.batten Street,12409,Mortagage

Raju,11Kannan nagar,12546,Home Loan

Raju,11Kannan nagar,12546,PersonalLoan

Raju,11Kannan nagar,12546,Mortgage

Raju,11Kannan nagar,12546,Mortgage

;

run;

data want;

set have;

LAG_PROD= lag(prod);

by name ADDRESS zip prod notsorted;

if first.name and first.address and first.zip and first.prod then prod_number = 1;

else prod_number+1;

if name=lag(name) and address=lag(address) and zip=lag(zip) and prod=lag(prod) then new_prod_number=1 ;

else new_prod_number=prod_number;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 08:28 AM

I encourage you to learn the SAS programming language, as much through help from this list, but also with your own desk-checking and DOC-review.

You don't need LAG and if you were to add a PUTLOG _ALL_ ; statement at various points in your DATA step coding logic, you would learn/understand why not.

Your SAS code that assigns PROD_NUMBER based on FIRST.<break_varname> tests is sufficient -- meaning no LAG required. And you do not need to test each BY variable -- only the "significant" variable in the list that would influence your DO/END or otherwise some SAS assignment logic -- in your case that is the PROD_NUMBER assignment, either as a first-occurrence value of 1 or to increment.

Please take the opportunity and review the available SAS DOC and examples.

No doubt you will be given the answer at some point, if you continue to throw out code-pieces not thoroughly explored / tested on our own accord.

Also, the location of the assignment statement that uses LAG function has no bearing on the location of the BY statement at all.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 09:44 AM

While I agree with sbb ... it is important that you learn about FIRST. and LAST. ... here is a program that should fit your example. It assumes your data are properly sorted:

data want;

set have;

by name address zip prod notsorted;

if first.zip then prod_number=1;

else if first.prod then prod_number + 1;

run;

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 07:45 AM

You should define your lag variable before doing anything else with it.

Here's a solution for you but it doesn't agree with your output:

data have;

infile cards dsd;

length address prod $25.;

input Name $ Address $ Zip Prod $;

cards;

John,110,KX Colony,11198,PersonalLoan

Kiran,7/189 Rajaji Road,11094,Mortagge

Kiran,7/189 Rajaji Road,11094,Creditcard

Amar,16,Mt.batten Street,12409,Mortagage

Raju,11,Kannan nagar,12546,Home Loan

Raju,11,Kannan nagar,12546,PersonalLoan

Raju,11,Kannan nagar,12546,Mortgage

;

run;

data want;

set have;

LAG_PROD= lag(prod);

by name ADDRESS zip prod notsorted;

if first.prod then prod_number = 1;

if prod=LAG_PROD then prod_number+1;

run;

Solution

05-20-2015
10:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-20-2015 10:52 AM

If you are using BY then you do not need to use LAG() to check for changes. SAS will have already done that and set the FIRST. and LAST. variables to indicate when values change.

So let's assume that NAME,ADDRESS,ZIP define a customer and you want to count the distinct products for that customer.

There is some risk in using the NOTSORTED option because it might be possible that the records for one customer are separated by the records for another customer. In that case you could end up with two different values of PROD for that same customer getting PROD_NUMBER set to one.

**data** want ;

set have ;

by name address zip prod notsorted;

if first.zip then prod_number=**1**;

else prod_number+first.prod;

**run**;