BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GKati
Pyrite | Level 9

 

Dear Members,

 

I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id. 

 

What is the most efficient way of doing this for a large dataset? 

 

data Have;   

input product_id      year activity;

cards;

1     2012    25     

1     2012    32

1     2013  25

1     2013    32

1     2013    11

2     2014    25     

2     2014    32

;

run;

 

Desired output:

 

Product_id

year

activity

new_id

1

2012

25

1

1

2012

32

1

1

2013

25

2

1

2013

32

2

1

2013

11

2

2

2014

25

1

2

2014

32

1

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
GKati
Pyrite | Level 9

Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11). 

 

The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing. 

 

But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content. 

 

Thanks everyone for your help. 

View solution in original post

5 REPLIES 5
kiranv_
Rhodochrosite | Level 12

is this dataset in SAS or in any database

GKati
Pyrite | Level 9
Yes it is...
kiranv_
Rhodochrosite | Level 12

Not sure, how efficiently this will be as it is doing multiple queries in same query but this will give the solution you want 

proc sql;
select *, case when year in 
(select year from have where activity not in
(select  activity from have 
where year in (select min(year) from have))) then 2 else 1 end as new_id
from have;
ballardw
Super User

@GKati wrote:

 

Dear Members,

 

I have a large dataset with healthcare products (over 4000 products). Each product is made up of activities. The required activities change a little bit each year, and some of the products are repackaged (or renamed a different product name) without the underlying activities changing (see product 2 in 2014). I would like to link these repackaged products together by giving them a new id. 

 

What is the most efficient way of doing this for a large dataset? 

 

data Have;   

input product_id      year activity;

cards;

1     2012    25     

1     2012    32

1     2013  25

1     2013    32

1     2013    11

2     2014    25     

2     2014    32

;

run;

 

Desired output:

 

Product_id

year

activity

new_id

1

2012

25

1

1

2012

32

1

1

2013

25

2

1

2013

32

2

1

2013

11

2

2

2014

25

1

2

2014

32

1

 

 


I have to say that for the example data I am not seeing any difference in the new_id and a year sequence number (1 - first year of product, 2=second year of product, 3= third year of product etc). What would make the new_id have more than value for any given year?

GKati
Pyrite | Level 9

Thanks ballardw. I wasn't clear enough in my explanation. One product is made up of several lines in the example, as each product is made up of different activities. in the example, product 1 in 2012 has 2 activities (25 and 32). The same product 1 in 2013 has 3 activites (25, 32, and 11). 

 

The problem is is that the number of activities required to claim a product keeps changing and sometimes the product id changes for the same activities. What I want to do is link the products together where only the product id changed without the underlying activities changing. 

 

But your comment might have solved my problem. I can change the dataset from long to wide (this will create one line by product with several variables called activity1, activity2, etc.) and create a unique id for products with the same activity content. 

 

Thanks everyone for your help. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 757 views
  • 0 likes
  • 3 in conversation