I have a dataset in the following structure:
Application_ID Product_ID Product1_Category Product2_Category
11111 a m1 m3
11111 b n2 n4
11111 k3 k5
11111 t3 t7
As you can see, there are two product_id in this application. Each product correspond to their product_category. As the product id come in sequence, the first product id will correspond to Product1_Category, and so on. Due to the weird data structure, I can't use the proc transpose procedure. Here is what the result should be:
Application_ID Product_ID Product_Category
11111 a m1
11111 a n2
11111 a k3
11111 a t3
11112 b m3
11113 b n4
11114 b k5
11115 b t7
I have tried to use proc transpose but the result is not what I want.
I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.
data have; infile datalines dsd; input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $; datalines; 11111,a,1,m1,m3 11111,b,2,n2,n4 11111,,3,k3,k5 11111,,4,t3,t7 ; run; proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_; by Application_id; var Product_ID; copy Product1_Category Product2_Category; id ID; run; data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i); set want; by Application_ID; retain prodtemp_1-prodtemp_4; array prodtemp {4} $ prodtemp_1-prodtemp_4; array prod {4} $ prod_1 - prod_4; do i = 1 to 4; if prod{i} ne '' then prodtemp{i} = prod{i}; prod{i} = prodtemp{i}; end; array category {2} $ Product1_category Product2_category; do i = 1 to 2; if category {i} ='' then leave; product_category = category{i}; product = prod{i}; output; end; run;
What's the logic?
This problem is created by the steps leading up to this. It will be quite tricky to get the output you want from that data, it will be far easier to fix previous steps to give good output. Can you post: Test data in the form of a datastep (we don't need to be typing this in for you), and the code steps which create the data as you present it here.
The alternative is you start writign very messy, hardcoded code like below which I really don't recommend just to fix bad processing before this:
data have; infile datalines dsd; input Application_ID Product_ID $ Product1_Category $ Product2_Category $; datalines; 11111,a,m1,m3 11111,b,n2,n4 11111,,k3,k5 11111,,t3,t7 ; run; data first; set have (keep=application_id product_id product1_category rename=(product1_category=cat)); product_id="a"; run; data second; set have (keep=application_id product_id product2_category rename=(product2_category=cat)); product_id="b"; run; data want; set first second; run;
Hi @RW9
Thank you for your response. Unfortunately, I can't fix the bad process as I was given the data as is. Fortunately, I have come up with a way to clean is up, which I will post against my original questions.
Here's a method that doesn't rely on knowing the PRODUCT_ID values.
data want;
set have (where=(product > ' '));
array prods {2} product1_category product2_category;
do _i_=1 to _nobs_;
set have (drop=product_id) point=_i_ nobs=_nobs_;
product_category = prods{_n_};
if _n_ > 1 then application_id + 1;
output;
end;
keep application_id product_id product;
run;
It's not clear if you meant to increment APPLICATION_ID in your example, so that part can always be adjusted. Other adjustments (such as more products) can also be handled ... it's just not clear how much of your example data set is the real structure and how much might be a simplified example.
@Astounding, Thanks for your solution. However, it is not what I'm after. I will post my code and the outcome shortly.
I have come up with a solution. This may not be the most elegant way but it works. Please feel free to provide any improvement.
data have; infile datalines dsd; input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $; datalines; 11111,a,1,m1,m3 11111,b,2,n2,n4 11111,,3,k3,k5 11111,,4,t3,t7 ; run; proc transpose data= Have out=Want(drop=_NAME_) Prefix=Prod_; by Application_id; var Product_ID; copy Product1_Category Product2_Category; id ID; run; data want(drop = Prod_1 - Prod_4 prodtemp_1-prodtemp_4 product1_category product2_category i); set want; by Application_ID; retain prodtemp_1-prodtemp_4; array prodtemp {4} $ prodtemp_1-prodtemp_4; array prod {4} $ prod_1 - prod_4; do i = 1 to 4; if prod{i} ne '' then prodtemp{i} = prod{i}; prod{i} = prodtemp{i}; end; array category {2} $ Product1_category Product2_category; do i = 1 to 2; if category {i} ='' then leave; product_category = category{i}; product = prod{i}; output; end; run;
It looks like you have N products. There are N ProductX_Category variables and the first N rows have the Product_ID values.
So read the values and do a normal array based transform of the category values. But add in a statement to reset the PRODUCT_ID value based on the index into the array.
data have;
infile datalines dsd;
input Application_ID Product_ID $ ID $ Product1_Category $ Product2_Category $;
datalines;
11111,a,1,m1,m3
11111,b,2,n2,n4
11111,,3,k3,k5
11111,,4,t3,t7
;
data want ;
set have ;
array p product1_Category product2_category ;
do i=1 to dim(p);
if i <= nobs then set have(keep=product_id) point=i nobs=nobs;
Product_Category=p(i);
output;
end;
drop product1_Category product2_category ;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.