DATA Step, Macro, Functions and more

Tricky transpose task

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Tricky transpose task

I have a dataset in the following structure:

Application_ID  Product_ID  Product1_Category   Product2_Category
11111           a           m1                  m3
11111           b           n2                  n4
11111                       k3                  k5
11111                       t3                  t7

As you can see, there are two product_id in this application. Each product correspond to their product_category. As the product id come in sequence, the first product id will correspond to Product1_Category, and so on. Due to the weird data structure, I can't use the proc transpose procedure. Here is what the result should be:

Application_ID  Product_ID  Product_Category
11111           a           m1
11111           a           n2
11111           a           k3
11111           a           t3
11112           b           m3
11113           b           n4
11114           b           k5
11115           b           t7

I have tried to use proc transpose but the result is not what I want.


Accepted Solutions
Solution
‎04-27-2017 08:35 AM
Occasional Contributor
Posts: 6

Re: Tricky transpose task

I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.

 

 

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
run;

proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_;
	by Application_id;
	var Product_ID;
	copy Product1_Category Product2_Category;
	id ID;
run;
	 

data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i);
set want;
by Application_ID;
	retain prodtemp_1-prodtemp_4;
	array prodtemp {4} $ prodtemp_1-prodtemp_4;
	array prod {4} $ prod_1 - prod_4;
		do i = 1 to 4;
		if prod{i} ne '' then prodtemp{i} = prod{i};
		prod{i} = prodtemp{i};
		end;
	array category {2} $ Product1_category Product2_category; 
		do i = 1 to 2;
		if category {i} ='' then leave;
		product_category = category{i};
		product = prod{i};
		output;
		end;
run;

 

View solution in original post


All Replies
Super User
Posts: 17,819

Re: Tricky transpose task

What's the logic?

Occasional Contributor
Posts: 6

Re: Tricky transpose task

there are many application ids and each application can have up to 10 products. There are actually 10 Product Categories (Product1_Category to Product10_Category) and each category correspond to one product. For example, product_id "a" correspond to Product1_Category and product_id "b" correspond to Product10_Category. For simplicity, the example provided only has two product ids.
Super User
Super User
Posts: 7,401

Re: Tricky transpose task

This problem is created by the steps leading up to this.  It will be quite tricky to get the output you want from that data, it will be far easier to fix previous steps to give good output.  Can you post: Test data in the form of a datastep (we don't need to be typing this in for you), and the code steps which create the data as you present it here.

The alternative is you start writign very messy, hardcoded code like below which I really don't recommend just to fix bad processing before this:

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,m1,m3
11111,b,n2,n4
11111,,k3,k5
11111,,t3,t7
;
run;
data first;
  set have (keep=application_id product_id product1_category rename=(product1_category=cat));
  product_id="a";
run;
data second;
  set have (keep=application_id product_id product2_category rename=(product2_category=cat));
  product_id="b";
run;

data want;
  set first second;
run; 
Occasional Contributor
Posts: 6

Re: Tricky transpose task

Hi @RW9 

Thank you for your response. Unfortunately, I can't fix the bad process as I was given the data as is. Fortunately, I have come up with a way to clean is up, which I will post against my original questions. 

Super User
Posts: 5,081

Re: Tricky transpose task

Here's a method that doesn't rely on knowing the PRODUCT_ID values.

 

data want;

set have (where=(product > ' '));

array prods {2} product1_category product2_category;

do _i_=1 to _nobs_;

   set have (drop=product_id) point=_i_ nobs=_nobs_;

   product_category = prods{_n_};

   if _n_ > 1 then application_id + 1;

   output;

end;

keep application_id product_id product;

run;

 

It's not clear if you meant to increment APPLICATION_ID in your example, so that part can always be adjusted.  Other adjustments (such as more products) can also be handled ... it's just not clear how much of your example data set is the real structure and how much might be a simplified example.

Occasional Contributor
Posts: 6

Re: Tricky transpose task

@Astounding, Thanks for your solution. However, it is not what I'm after. I will post my code and the outcome shortly. 

 

Solution
‎04-27-2017 08:35 AM
Occasional Contributor
Posts: 6

Re: Tricky transpose task

I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.

 

 

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
run;

proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_;
	by Application_id;
	var Product_ID;
	copy Product1_Category Product2_Category;
	id ID;
run;
	 

data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i);
set want;
by Application_ID;
	retain prodtemp_1-prodtemp_4;
	array prodtemp {4} $ prodtemp_1-prodtemp_4;
	array prod {4} $ prod_1 - prod_4;
		do i = 1 to 4;
		if prod{i} ne '' then prodtemp{i} = prod{i};
		prod{i} = prodtemp{i};
		end;
	array category {2} $ Product1_category Product2_category; 
		do i = 1 to 2;
		if category {i} ='' then leave;
		product_category = category{i};
		product = prod{i};
		output;
		end;
run;

 

Super User
Super User
Posts: 6,499

Re: Tricky transpose task

It looks like you have N products. There are N ProductX_Category variables and the first N rows have the Product_ID values.

 

So read the values and do a normal array based transform of the category values.  But add in a statement to reset the PRODUCT_ID value based on the index into the array.

 

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;

data want ;
  set have ;
  array p product1_Category product2_category ;
  do i=1 to dim(p);
    if i <= nobs then set have(keep=product_id) point=i nobs=nobs;
    Product_Category=p(i);
    output;
  end;
  drop product1_Category product2_category ;
run;

Capture.PNG

Occasional Contributor
Posts: 6

Re: Tricky transpose task

Thank you @Tom. Your have the best answer
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 358 views
  • 1 like
  • 5 in conversation