DATA Step, Macro, Functions and more

Linking products

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 107
Accepted Solution

Linking products

 

Dear Members,

 

I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id. 

 

What is the most efficient way of doing this for a large dataset? 

 

data Have;   

input product_id      year activity;

cards;

1     2012    25     

1     2012    32

1     2013  25

1     2013    32

1     2013    11

2     2014    25     

2     2014    32

;

run;

 

Desired output:

 

Product_id

year

activity

new_id

1

2012

25

1

1

2012

32

1

1

2013

25

2

1

2013

32

2

1

2013

11

2

2

2014

25

1

2

2014

32

1

 

 


Accepted Solutions
Solution
‎01-24-2018 06:16 AM
Frequent Contributor
Posts: 107

Re: Linking products

Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11). 

 

The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing. 

 

But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content. 

 

Thanks everyone for your help. 

View solution in original post


All Replies
PROC Star
Posts: 509

Re: Linking products

is this dataset in SAS or in any database

Frequent Contributor
Posts: 107

Re: Linking products

Yes it is...
PROC Star
Posts: 509

Re: Linking products

Not sure, how efficiently this will be as it is doing multiple queries in same query but this will give the solution you want 

proc sql;
select *, case when year in 
(select year from have where activity not in
(select  activity from have 
where year in (select min(year) from have))) then 2 else 1 end as new_id
from have;
Super User
Posts: 13,508

Re: Linking products


GKati wrote:

 

Dear Members,

 

I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id. 

 

What is the most efficient way of doing this for a large dataset? 

 

data Have;   

input product_id      year activity;

cards;

1     2012    25     

1     2012    32

1     2013  25

1     2013    32

1     2013    11

2     2014    25     

2     2014    32

;

run;

 

Desired output:

 

Product_id

year

activity

new_id

1

2012

25

1

1

2012

32

1

1

2013

25

2

1

2013

32

2

1

2013

11

2

2

2014

25

1

2

2014

32

1

 

 


I have to say that for the example data I am not seeing any difference in the new_id and a year sequence number (1 - first year of product, 2=second year of product, 3= third year of product etc). What would make the new_id have more than value for any given year?

Solution
‎01-24-2018 06:16 AM
Frequent Contributor
Posts: 107

Re: Linking products

Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11). 

 

The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing. 

 

But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content. 

 

Thanks everyone for your help. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 155 views
  • 0 likes
  • 3 in conversation