Solved: Re: Tricky transpose task

cypresswang · Posted 04-27-2017 12:00 AM

I have a dataset in the following structure:

Application_ID  Product_ID  Product1_Category   Product2_Category
11111           a           m1                  m3
11111           b           n2                  n4
11111                       k3                  k5
11111                       t3                  t7

As you can see, there are two product_id in this application. Each product correspond to their product_category. As the product id come in sequence, the first product id will correspond to Product1_Category, and so on. Due to the weird data structure, I can't use the proc transpose procedure. Here is what the result should be:

Application_ID  Product_ID  Product_Category
11111           a           m1
11111           a           n2
11111           a           k3
11111           a           t3
11112           b           m3
11113           b           n4
11114           b           k5
11115           b           t7

I have tried to use proc transpose but the result is not what I want.

cypresswang · Posted 04-27-2017 08:34 AM

I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
run;

proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_;
	by Application_id;
	var Product_ID;
	copy Product1_Category Product2_Category;
	id ID;
run;
	 

data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i);
set want;
by Application_ID;
	retain prodtemp_1-prodtemp_4;
	array prodtemp {4} $ prodtemp_1-prodtemp_4;
	array prod {4} $ prod_1 - prod_4;
		do i = 1 to 4;
		if prod{i} ne '' then prodtemp{i} = prod{i};
		prod{i} = prodtemp{i};
		end;
	array category {2} $ Product1_category Product2_category; 
		do i = 1 to 2;
		if category {i} ='' then leave;
		product_category = category{i};
		product = prod{i};
		output;
		end;
run;

View solution in original post

Reeza · Posted 04-27-2017 12:41 AM

What's the logic?

cypresswang · Posted 04-27-2017 01:01 AM

there are many application ids and each application can have up to 10 products. There are actually 10 Product Categories (Product1_Category to Product10_Category) and each category correspond to one product. For example, product_id "a" correspond to Product1_Category and product_id "b" correspond to Product10_Category. For simplicity, the example provided only has two product ids.

RW9 · Posted 04-27-2017 04:32 AM

This problem is created by the steps leading up to this. It will be quite tricky to get the output you want from that data, it will be far easier to fix previous steps to give good output. Can you post: Test data in the form of a datastep (we don't need to be typing this in for you), and the code steps which create the data as you present it here.

The alternative is you start writign very messy, hardcoded code like below which I really don't recommend just to fix bad processing before this:

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,m1,m3
11111,b,n2,n4
11111,,k3,k5
11111,,t3,t7
;
run;
data first;
  set have (keep=application_id product_id product1_category rename=(product1_category=cat));
  product_id="a";
run;
data second;
  set have (keep=application_id product_id product2_category rename=(product2_category=cat));
  product_id="b";
run;

data want;
  set first second;
run;

cypresswang · Posted 04-27-2017 08:15 AM

Hi @RW9

Thank you for your response. Unfortunately, I can't fix the bad process as I was given the data as is. Fortunately, I have come up with a way to clean is up, which I will post against my original questions.

Astounding · Posted 04-27-2017 07:21 AM

Here's a method that doesn't rely on knowing the PRODUCT_ID values.

data want;

set have (where=(product > ' '));

array prods {2} product1_category product2_category;

do _i_=1 to _nobs_;

set have (drop=product_id) point=_i_ nobs=_nobs_;

product_category = prods{_n_};

if _n_ > 1 then application_id + 1;

output;

end;

keep application_id product_id product;

run;

It's not clear if you meant to increment APPLICATION_ID in your example, so that part can always be adjusted. Other adjustments (such as more products) can also be handled ... it's just not clear how much of your example data set is the real structure and how much might be a simplified example.

cypresswang · Posted 04-27-2017 08:16 AM

@Astounding, Thanks for your solution. However, it is not what I'm after. I will post my code and the outcome shortly.

cypresswang · Posted 04-27-2017 08:34 AM

I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
run;

proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_;
	by Application_id;
	var Product_ID;
	copy Product1_Category Product2_Category;
	id ID;
run;
	 

data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i);
set want;
by Application_ID;
	retain prodtemp_1-prodtemp_4;
	array prodtemp {4} $ prodtemp_1-prodtemp_4;
	array prod {4} $ prod_1 - prod_4;
		do i = 1 to 4;
		if prod{i} ne '' then prodtemp{i} = prod{i};
		prod{i} = prodtemp{i};
		end;
	array category {2} $ Product1_category Product2_category; 
		do i = 1 to 2;
		if category {i} ='' then leave;
		product_category = category{i};
		product = prod{i};
		output;
		end;
run;

Tom · Posted 04-27-2017 09:09 AM

It looks like you have N products. There are N ProductX_Category variables and the first N rows have the Product_ID values.

So read the values and do a normal array based transform of the category values. But add in a statement to reset the PRODUCT_ID value based on the index into the array.

data have;
  infile datalines dsd;
  input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;

data want ;
  set have ;
  array p product1_Category product2_category ;
  do i=1 to dim(p);
    if i <= nobs then set have(keep=product_id) point=i nobs=nobs;
    Product_Category=p(i);
    output;
  end;
  drop product1_Category product2_category ;
run;

cypresswang · Posted 04-28-2017 05:58 AM

Thank you @Tom. Your have the best answer

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away