DATA Step, Macro, Functions and more

Restructure Dataset

Accepted Solution Solved
Reply
Contributor
Posts: 69
Accepted Solution

Restructure Dataset

[ Edited ]

I have the below dataset at a customer level-

 

 
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

I need the output as follows-

data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;

 

 

I tried using PROC Transpose but since the dataset is very large, it takes a huge amount of time..


Accepted Solutions
Solution
‎01-04-2018 11:11 AM
Super User
Super User
Posts: 9,427

Re: Restructure Dataset

Posted in reply to sasmaverick

Sorry, only meant the Have dataset as a datastep.  Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.

data have;
  infile datalines;
  input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines; 
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

data inter (keep=customer_id amount channel brand);
  set have;
  length amount channel brand $200;
  array channels{3} channel_r--channel_w;
  array brands{2} brand_brand1 brand_brand2;
  do j=1 to 3;
    amount=channels{j};
    channel=vname(channels{j});
    do i=1 to 2;
      brand=vname(brands{i});
      output;
    end;
  end;
run;

proc sql;
  create table WANT as 
  select  AMOUNT,
          count(distinct CUSTOMER_ID) as CUSTOMER_COUNT,
          CHANNEL,
          BRAND
  from    INTER
  group by AMOUNT,
           CHANNEL,
           BRAND;
quit;
      

View solution in original post


All Replies
Super User
Super User
Posts: 9,427

Re: Restructure Dataset

Posted in reply to sasmaverick

Post test data in the form of a datastep!!

 

As such I am not typing that in.  If you have a large amount of data, it will take a large amount of time, irrespective of what code you use.  Although from your output want data I don't see how transpose could get your there unless you have multiple procs.  A datastep, which loops of an array of the variables and outputs a row for each item, then left join a count onto that, may be a bit quicker.

Contributor
Posts: 69

Re: Restructure Dataset

@RW9: Sorry. Below my restructured question. It's a bit challenging for me hence I was looking for a proper code:

 


data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

 


data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;

Solution
‎01-04-2018 11:11 AM
Super User
Super User
Posts: 9,427

Re: Restructure Dataset

Posted in reply to sasmaverick

Sorry, only meant the Have dataset as a datastep.  Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.

data have;
  infile datalines;
  input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines; 
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

data inter (keep=customer_id amount channel brand);
  set have;
  length amount channel brand $200;
  array channels{3} channel_r--channel_w;
  array brands{2} brand_brand1 brand_brand2;
  do j=1 to 3;
    amount=channels{j};
    channel=vname(channels{j});
    do i=1 to 2;
      brand=vname(brands{i});
      output;
    end;
  end;
run;

proc sql;
  create table WANT as 
  select  AMOUNT,
          count(distinct CUSTOMER_ID) as CUSTOMER_COUNT,
          CHANNEL,
          BRAND
  from    INTER
  group by AMOUNT,
           CHANNEL,
           BRAND;
quit;
      
Contributor
Posts: 69

Re: Restructure Dataset

Hi @RW9 Thanks for the effort. I think this works. The customer count in output data shown was just for reference and not exact numbers that we would get. What I am trying to achieve is  to get the count of customers unique across an amount-channel-brand group. and I think your solutions addresses that.

 

Regards,

 

Super User
Posts: 13,333

Re: Restructure Dataset

Posted in reply to sasmaverick

Please describe what (and how as a rule) to get the "amount" for:

Channel_R and Brand_Brand1

Channel_R and Brand_Brand2

Channel_O and Brand_Brand1

Channel_O and Brand_Brand2

Channel_W and Brand_Brand1

Channel_W and Brand_Brand2

 

from the data in your first row.

I do NOT see any obvious way to accomplish that.

I thought it might be possibly to generate a count data set first to count customers and transpose that set but the issue is the intersection of "channel" and "brand' to generate a single amount is not intuitively obvious to me.

Also, are the values shown actually text or the display of a numeric value using a format?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 136 views
  • 1 like
  • 3 in conversation