BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GKati
Pyrite | Level 9

 

Dear Members,

 

I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id. 

 

What is the most efficient way of doing this for a large dataset? 

 

data Have;   

input product_id      year activity;

cards;

1     2012    25     

1     2012    32

1     2013  25

1     2013    32

1     2013    11

2     2014    25     

2     2014    32

;

run;

 

Desired output:

 

Product_id

year

activity

new_id

1

2012

25

1

1

2012

32

1

1

2013

25

2

1

2013

32

2

1

2013

11

2

2

2014

25

1

2

2014

32

1

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
GKati
Pyrite | Level 9

Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11). 

 

The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing. 

 

But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content. 

 

Thanks everyone for your help. 

View solution in original post

5 REPLIES 5
kiranv_
Rhodochrosite | Level 12

is this dataset in SAS or in any database

GKati
Pyrite | Level 9
Yes it is...
kiranv_
Rhodochrosite | Level 12

Not sure, how efficiently this will be as it is doing multiple queries in same query but this will give the solution you want 

proc sql;
select *, case when year in 
(select year from have where activity not in
(select  activity from have 
where year in (select min(year) from have))) then 2 else 1 end as new_id
from have;
ballardw
Super User

@GKati wrote:

 

Dear Members,

 

I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id. 

 

What is the most efficient way of doing this for a large dataset? 

 

data Have;   

input product_id      year activity;

cards;

1     2012    25     

1     2012    32

1     2013  25

1     2013    32

1     2013    11

2     2014    25     

2     2014    32

;

run;

 

Desired output:

 

Product_id

year

activity

new_id

1

2012

25

1

1

2012

32

1

1

2013

25

2

1

2013

32

2

1

2013

11

2

2

2014

25

1

2

2014

32

1

 

 


I have to say that for the example data I am not seeing any difference in the new_id and a year sequence number (1 - first year of product, 2=second year of product, 3= third year of product etc). What would make the new_id have more than value for any given year?

GKati
Pyrite | Level 9

Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11). 

 

The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing. 

 

But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content. 

 

Thanks everyone for your help. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1533 views
  • 0 likes
  • 3 in conversation