BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasmaverick
Obsidian | Level 7

I have the below dataset at a customer level-

 

 
data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

I need the output as follows-

data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;

 

 

I tried using PROC Transpose but since the dataset is very large, it takes a huge amount of time..

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, only meant the Have dataset as a datastep.  Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.

data have;
  infile datalines;
  input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines; 
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

data inter (keep=customer_id amount channel brand);
  set have;
  length amount channel brand $200;
  array channels{3} channel_r--channel_w;
  array brands{2} brand_brand1 brand_brand2;
  do j=1 to 3;
    amount=channels{j};
    channel=vname(channels{j});
    do i=1 to 2;
      brand=vname(brands{i});
      output;
    end;
  end;
run;

proc sql;
  create table WANT as 
  select  AMOUNT,
          count(distinct CUSTOMER_ID) as CUSTOMER_COUNT,
          CHANNEL,
          BRAND
  from    INTER
  group by AMOUNT,
           CHANNEL,
           BRAND;
quit;
      

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep!!

 

As such I am not typing that in.  If you have a large amount of data, it will take a large amount of time, irrespective of what code you use.  Although from your output want data I don't see how transpose could get your there unless you have multiple procs.  A datastep, which loops of an array of the variables and outputs a row for each item, then left join a count onto that, may be a bit quicker.

sasmaverick
Obsidian | Level 7

@RW9: Sorry. Below my restructured question. It's a bit challenging for me hence I was looking for a proper code:

 


data have;
infile datalines;
input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines;
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

 


data want;
infile datalines;
input Amount : $15. Customer_Count : $5. Brand : $15. Channel : $10.;
datalines;
Missing 100 Brand1 W
<$0 200 Brand1 W
$.01-$25 300 Brand1 W
$25.01-$50 400 Brand1 W
$50.01-$75 500 Brand1 W
$75.01-$100 600 Brand1 W
$100.01-$150 700 Brand1 W
$150.01-$200 800 Brand1 W
$200+ 900 Brand1 W
Missing 100 Brand2 R
<$0 200 Brand2 R
$0.01-$25 300 Brand2 R
$25.01-$50 400 Brand2 R
$50.01-$75 500 Brand2 R
$75.01-$100 600 Brand2 R
$100.01-$150 700 Brand2 R
$150.01-$200 800 Brand2 R
$200+ 900 Brand2 R
;
run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, only meant the Have dataset as a datastep.  Anyways, this goes someway to what you want, not sure how you get the customer_count, as it doesn't match the test data you provide, maybe clarify.

data have;
  infile datalines;
  input CUSTOMER_ID : $1. Channel_R : $15. Channel_O : $15. Channel_W : $15. Brand_Brand1 : $15. Brand_Brand2 : $15.;
datalines; 
1 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
2 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
3 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
4 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
5 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
6 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
7 $25.01-$50 $0.01-$25 $100.01-$150 $150.01-$200 $200+
8 <$0 $25.01-$50 $25.01-$50 $100.01-$150 $50.01-$75
9 Missing $0.01-$25 $100.01-$150 $150.01-$200 $200+
10 <$0 $25.01-$50 Missing $100.01-$150 $50.01-$75
;
run;

data inter (keep=customer_id amount channel brand);
  set have;
  length amount channel brand $200;
  array channels{3} channel_r--channel_w;
  array brands{2} brand_brand1 brand_brand2;
  do j=1 to 3;
    amount=channels{j};
    channel=vname(channels{j});
    do i=1 to 2;
      brand=vname(brands{i});
      output;
    end;
  end;
run;

proc sql;
  create table WANT as 
  select  AMOUNT,
          count(distinct CUSTOMER_ID) as CUSTOMER_COUNT,
          CHANNEL,
          BRAND
  from    INTER
  group by AMOUNT,
           CHANNEL,
           BRAND;
quit;
      
sasmaverick
Obsidian | Level 7

Hi @RW9 Thanks for the effort. I think this works. The customer count in output data shown was just for reference and not exact numbers that we would get. What I am trying to achieve is  to get the count of customers unique across an amount-channel-brand group. and I think your solutions addresses that.

 

Regards,

 

ballardw
Super User

Please describe what (and how as a rule) to get the "amount" for:

Channel_R and Brand_Brand1

Channel_R and Brand_Brand2

Channel_O and Brand_Brand1

Channel_O and Brand_Brand2

Channel_W and Brand_Brand1

Channel_W and Brand_Brand2

 

from the data in your first row.

I do NOT see any obvious way to accomplish that.

I thought it might be possibly to generate a count data set first to count customers and transpose that set but the issue is the intersection of "channel" and "brand' to generate a single amount is not intuitively obvious to me.

Also, are the values shown actually text or the display of a numeric value using a format?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 910 views
  • 1 like
  • 3 in conversation