Dear Members,
I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id.
What is the most efficient way of doing this for a large dataset?
data Have;
input product_id year activity;
cards;
1 2012 25
1 2012 32
1 2013 25
1 2013 32
1 2013 11
2 2014 25
2 2014 32
;
run;
Desired output:
Product_id | year | activity | new_id |
1 | 2012 | 25 | 1 |
1 | 2012 | 32 | 1 |
1 | 2013 | 25 | 2 |
1 | 2013 | 32 | 2 |
1 | 2013 | 11 | 2 |
2 | 2014 | 25 | 1 |
2 | 2014 | 32 | 1 |
Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11).
The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing.
But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content.
Thanks everyone for your help.
is this dataset in SAS or in any database
Not sure, how efficiently this will be as it is doing multiple queries in same query but this will give the solution you want
proc sql;
select *, case when year in
(select year from have where activity not in
(select activity from have
where year in (select min(year) from have))) then 2 else 1 end as new_id
from have;
@GKati wrote:
Dear Members,
I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id.
What is the most efficient way of doing this for a large dataset?
data Have;
input product_id year activity;
cards;
1 2012 25
1 2012 32
1 2013 25
1 2013 32
1 2013 11
2 2014 25
2 2014 32
;
run;
Desired output:
Product_id
year
activity
new_id
1
2012
25
1
1
2012
32
1
1
2013
25
2
1
2013
32
2
1
2013
11
2
2
2014
25
1
2
2014
32
1
I have to say that for the example data I am not seeing any difference in the new_id and a year sequence number (1 - first year of product, 2=second year of product, 3= third year of product etc). What would make the new_id have more than value for any given year?
Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11).
The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing.
But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content.
Thanks everyone for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.