Hello!
This is my first posting, so apologies if I fail in to provide sufficient info. The parcel IDs and property types have been changed for presentation here for the sake of data privacy.
Here is the background. There are parcels with distinctive parcel IDs. A parcel can have more than one property type, such as agricultural homestead, agricultural production fields, or other types. Some parcels can have as many as 6 property types.
I have data that has each property type, so the parcel IDs are replicated. I've enclosed an Excel version (so the data be changed to maintain privacy) that shows how the data is arranged in the case of one parcel with 5 property types:
Parcel_ID | Property_Type_Code | Property_Code_Quantity |
091-080-018100 | X21 | 5 |
091-080-018100 | X22 | 5 |
091-080-018100 | C45 | 5 |
091-080-018100 | Y10 | 5 |
091-080-018100 | Y20 | 5 |
I would like to create a SAS program that does this in creating a string with all the property types:
Parcel_ID | Property_Type_Code | Property_Code_Quantity | Property_Types |
091-080-018100 | X21 | 5 | X21 |
091-080-018100 | X22 | 5 | X21 X22 |
091-080-018100 | C45 | 5 | X21 X22 C45 |
091-080-018100 | Y10 | 5 | X21 X22 C45 Y10 |
091-080-018100 | Y20 | 5 | X21 X22 C45 Y10 Y20 |
Then I can select the last.Parcel_ID and there will be all the property types in the data for that parcel ID.
Here is a complication: sometimes the number of property types present in the data does not always agree with the types presented. There might be a total of 6 property types, but 5 or less are showing, since some are not selected for in creating the data. For example, property types that begin with "M" might have been excluded from the data. There may 6 property types, but only 5 are listed in the data.
I tried with the lag() function, and believe it is the key, but my solution so far is not elegant and gets more complicated with each circumstance and is getting ugly. I can do this OK if there are 1 or 2 or 3 property types on a parcel. When there are 3 property types but only two records present, this breaks down. After the code, I will show how it breaks down when run:
data work.Joined_Data_6;
set Run_Data.Joined_Data_2;
by Parcel_ID;
length Property_Codes $20.;
Previous_Type_1 = lag1(Property_Type_Code);
Previous_Type_2 = lag2(Property_Type_Code);
Previous_Type_3 = lag3(Property_Type_Code);
if (first.Parcel_ID) then Property_Codes = Property_Type_Code;
else if ((NOT first.Parcel_ID) and Property_Type_Code_Quantity = 2) then Property_Codes = catx(' ', Previous_Type_1, Property_Type_Code);
else if ((NOT first.Parcel_ID) and (NOT last.Parcel_ID) and Property_Type_Code_Quantity = 3) then Property_Codes = catx(' ', Previous_Type_1, Property_Type_Code);
else if ((NOT first.Parcel_ID) and (last.Parcel_ID) and Property_Type_Code_Quantity = 3) then Property_Codes = catx(' ', Previous_Type_2, Previous_Type_1, Property_Type_Code);
run;
Parcel_ID | Property_Type_Code | Property_Type_Code_Quantity | Property_Codes | Previous_Type_1 | Previous_Type_2 | Previous_Type_3 |
01-03-912200 | P45 | 3 | P45 | F56 | R56 | R54 |
01-03-912200 | Q50 | 3 | F56 P45 Q50 | P45 | F56 | R56 |
I've spent a lot of time on this and am frustrated--there has to be a robust solution out there. I am using the latest version of SAS and EG. Again, my sincere apologies if I have given an incomplete description.
Thank you for your consideration.
@The_Analyst wrote:
Your solution worked! I had tried to use retain at first, but had obviously applied it incorrectly. I believe I was likely retaining the wrong variable. That was when I went to lag() which is a dead end for this problem. Your solution is so beautiful and simple as to warm the heart of a SAS programmer. My sincere thanks for your expertise!
Without seeing the Retain code that didn't work it is hard to tell. However, if you use Retain with a variable that exists in a data set on the Set statement it gets reset to the value of the variable that already exists the next time a record is brought in from the data set. So it appears that retain doesn't do anything.
This is actually a better case for RETAIN.
data work.Joined_Data_6; set Run_Data.Joined_Data_2; by Parcel_ID; length Property_Codes $20.; retain Property_codes; if first.Parcel_id then Property_codes= Property_type_code; else Property_codes = catx(' ',Property_codes,Property_type_code); run;
Your solution worked! I had tried to use retain at first, but had obviously applied it incorrectly. I believe I was likely retaining the wrong variable. That was when I went to lag() which is a dead end for this problem. Your solution is so beautiful and simple as to warm the heart of a SAS programmer. My sincere thanks for your expertise!
@The_Analyst wrote:
Your solution worked! I had tried to use retain at first, but had obviously applied it incorrectly. I believe I was likely retaining the wrong variable. That was when I went to lag() which is a dead end for this problem. Your solution is so beautiful and simple as to warm the heart of a SAS programmer. My sincere thanks for your expertise!
Without seeing the Retain code that didn't work it is hard to tell. However, if you use Retain with a variable that exists in a data set on the Set statement it gets reset to the value of the variable that already exists the next time a record is brought in from the data set. So it appears that retain doesn't do anything.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.