BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cypresswang
Fluorite | Level 6

I have a dataset in the following structure:

Application_ID  Product_ID  Product1_Category   Product2_Category
11111           a           m1                  m3
11111           b           n2                  n4
11111                       k3                  k5
11111                       t3                  t7

As you can see, there are two product_id in this application. Each product correspond to their product_category. As the product id come in sequence, the first product id will correspond to Product1_Category, and so on. Due to the weird data structure, I can't use the proc transpose procedure. Here is what the result should be:

Application_ID  Product_ID  Product_Category
11111           a           m1
11111           a           n2
11111           a           k3
11111           a           t3
11112           b           m3
11113           b           n4
11114           b           k5
11115           b           t7

I have tried to use proc transpose but the result is not what I want.

1 ACCEPTED SOLUTION

Accepted Solutions
cypresswang
Fluorite | Level 6

I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.

 

 

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
run;

proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_;
	by Application_id;
	var Product_ID;
	copy Product1_Category Product2_Category;
	id ID;
run;
	 

data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i);
set want;
by Application_ID;
	retain prodtemp_1-prodtemp_4;
	array prodtemp {4} $ prodtemp_1-prodtemp_4;
	array prod {4} $ prod_1 - prod_4;
		do i = 1 to 4;
		if prod{i} ne '' then prodtemp{i} = prod{i};
		prod{i} = prodtemp{i};
		end;
	array category {2} $ Product1_category Product2_category; 
		do i = 1 to 2;
		if category {i} ='' then leave;
		product_category = category{i};
		product = prod{i};
		output;
		end;
run;

 

View solution in original post

9 REPLIES 9
Reeza
Super User

What's the logic?

cypresswang
Fluorite | Level 6
there are many application ids and each application can have up to 10 products. There are actually 10 Product Categories (Product1_Category to Product10_Category) and each category correspond to one product. For example, product_id "a" correspond to Product1_Category and product_id "b" correspond to Product10_Category. For simplicity, the example provided only has two product ids.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This problem is created by the steps leading up to this.  It will be quite tricky to get the output you want from that data, it will be far easier to fix previous steps to give good output.  Can you post: Test data in the form of a datastep (we don't need to be typing this in for you), and the code steps which create the data as you present it here.

The alternative is you start writign very messy, hardcoded code like below which I really don't recommend just to fix bad processing before this:

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,m1,m3
11111,b,n2,n4
11111,,k3,k5
11111,,t3,t7
;
run;
data first;
  set have (keep=application_id product_id product1_category rename=(product1_category=cat));
  product_id="a";
run;
data second;
  set have (keep=application_id product_id product2_category rename=(product2_category=cat));
  product_id="b";
run;

data want;
  set first second;
run; 
cypresswang
Fluorite | Level 6

Hi @RW9 

Thank you for your response. Unfortunately, I can't fix the bad process as I was given the data as is. Fortunately, I have come up with a way to clean is up, which I will post against my original questions. 

Astounding
PROC Star

Here's a method that doesn't rely on knowing the PRODUCT_ID values.

 

data want;

set have (where=(product > ' '));

array prods {2} product1_category product2_category;

do _i_=1 to _nobs_;

   set have (drop=product_id) point=_i_ nobs=_nobs_;

   product_category = prods{_n_};

   if _n_ > 1 then application_id + 1;

   output;

end;

keep application_id product_id product;

run;

 

It's not clear if you meant to increment APPLICATION_ID in your example, so that part can always be adjusted.  Other adjustments (such as more products) can also be handled ... it's just not clear how much of your example data set is the real structure and how much might be a simplified example.

cypresswang
Fluorite | Level 6

@Astounding, Thanks for your solution. However, it is not what I'm after. I will post my code and the outcome shortly. 

 

cypresswang
Fluorite | Level 6

I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.

 

 

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
run;

proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_;
	by Application_id;
	var Product_ID;
	copy Product1_Category Product2_Category;
	id ID;
run;
	 

data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i);
set want;
by Application_ID;
	retain prodtemp_1-prodtemp_4;
	array prodtemp {4} $ prodtemp_1-prodtemp_4;
	array prod {4} $ prod_1 - prod_4;
		do i = 1 to 4;
		if prod{i} ne '' then prodtemp{i} = prod{i};
		prod{i} = prodtemp{i};
		end;
	array category {2} $ Product1_category Product2_category; 
		do i = 1 to 2;
		if category {i} ='' then leave;
		product_category = category{i};
		product = prod{i};
		output;
		end;
run;

 

Tom
Super User Tom
Super User

It looks like you have N products. There are N ProductX_Category variables and the first N rows have the Product_ID values.

 

So read the values and do a normal array based transform of the category values.  But add in a statement to reset the PRODUCT_ID value based on the index into the array.

 

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;

data want ;
  set have ;
  array p product1_Category product2_category ;
  do i=1 to dim(p);
    if i <= nobs then set have(keep=product_id) point=i nobs=nobs;
    Product_Category=p(i);
    output;
  end;
  drop product1_Category product2_category ;
run;

Capture.PNG

cypresswang
Fluorite | Level 6
Thank you @Tom. Your have the best answer

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1840 views
  • 1 like
  • 5 in conversation