BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Dear SAS Folks, I request your help for an efficient solution for a very large dataset that challenges to compare to By-groups within by groups.

Here is the sample of my input data-set(have) with 5 columns:
STORE_ID PRODUCT_ID PRODUCT_WIN_POINT1 PRODUCT_WIN_POINT2 MONTHLY_PERIOD
W169 F1 1 0 jan
W169 F1 0 1 feb
W169 F1 0 0 mar
W169 F1 1 1 apr
W169 F1 0 0 may
W169 F1 1 1 jun
W169 V1 0 1 jan
W169 V1 1 1 feb
W169 V1 1 0 mar
W169 V1 1 0 apr
W169 V1 0 1 may
W169 V1 0 1 Jun
W169 T1 0 1 jan
W169 T1 1 0 feb
W169 R1 0 1 mar
W169 R1 0 0 may
W169 R1 0 1 jun
B291 B3 1 0 jan
B291 B2 1 1 jun
B291 B44 0 1 july

The need is to compare PRODUCT_ID values against MONTHLY_PERIOD within STORE_ID values.
Details:
Point1: For each STORE_ID, I need to compare PRODUCT_ID F1(product_win_point1 value) to the V1(product_win_point1 value) to the T1(product_win_point1 value) and similarly across all product_id's within a store_id that has values for the month of Jan. In the example above, you would notice there is no product_win_point1 and product_win_point2 value of R1 for the month of Jan.

Point2: Likewise, feb (product_win_point1) value of product id's needs to be compared as explained in point1. And the same exercise should be done for all months available, that is jan to jan, feb to feb, mar to mar........

Point3: Comparison logic--> For example, When comparing the values of product_win_point1 across months, let's say taking the example above:
For store_id W169,for the month of Jan, the product_win_point1 value for product_id F1=1, V1=0, T1=0,R1 is missing. Since there is at-least one product_win_point1 with value 1, the result should be stored as 1 for the month Jan. Likewise the same rule applies when comparing product_win_point1 values of product id's for feb against feb, mar against mar......

Point4: The same comparison logic as mentioned in Point1,2 & 3 has to be applied for PRODUCT_WIN_POINT2 values of product_ids against monthly periods.

I trust the points explained above can be comprehended and if not I am so keen to detail further whatever i can.

So, the final resulting WANT(Output) dataset:

WANT:
STORE_ID MONTHLY_PERIOD MONTH_as_number COMPARED_PRODUCT_WIN_POINT1 COMPARED_PRODUCT_WIN_POINT2
W169 jan 1 1 1
W169 feb 2 1 1
W169 mar 3 1 1
W169 apr 4 1 1
W169 may 5 0 1
W169 jun 6 1 1
B291 jan 1 1 0
B291 jun 6 1 1
B291 july 7 0 1

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. If you have a DATE variable in your dataset. MONTHLY_PERIOD should be expressed as number 1 2 3 4 ... that would be better.

compare this way with Hash Table , and see which one is faster.

 

 

data have;
input STORE_ID $ PRODUCT_ID $ PRODUCT_WIN_POINT1 PRODUCT_WIN_POINT2 MONTH ;
cards;
W169 F1 1 0 1
W169 F1 0 1 2
W169 F1 0 0 3
W169 F1 1 1 4
W169 F1 0 0 5
W169 F1 1 1 6
W169 V1 0 1 1
W169 V1 1 1 2
W169 V1 1 0 3
W169 V1 1 0 4
W169 V1 0 1 5
W169 V1 0 1 6
W169 T1 0 1 1
W169 T1 1 0 2
W169 R1 0 1 3
W169 R1 0 0 5
W169 R1 0 1 6
X291 B3 1 0 1
X291 B2 1 1 6
X291 B44 0 1 7
;
run;
data want;
 set have;
 by STORE_ID;
 array x1{12} _temporary_;
 array x2{12} _temporary_;
 if first.STORE_ID then call missing(of x1{*} x2{*});

 if PRODUCT_WIN_POINT1=1 or x1{MONTH}=1 then x1{MONTH}=1;
  else x1{MONTH}=0;
 if PRODUCT_WIN_POINT2=1 or x2{MONTH}=1 then x2{MONTH}=1;
  else x2{MONTH}=0;
 if last.STORE_ID then do;
  do i=1 to dim(x1);
   if not missing(x1{i}) then do;
    MONTHLY_PERIOD=put(mdy(i,1,2000),monname3.);  /*Use proc format to map if you want faster*/
    MONTH_as_number=i;
    COMPARED_PRODUCT_WIN_POINT1=x1{i};
    COMPARED_PRODUCT_WIN_POINT2=x2{i};
    output;
   end;
  end;
 end;
keep STORE_ID MONTHLY_PERIOD MONTH_as_number COMPARED_PRODUCT_WIN_POINT1 COMPARED_PRODUCT_WIN_POINT2;
run;
 

View solution in original post

6 REPLIES 6
Ksharp
Super User

Charlotte,

Yes. You wrote too many words and make me spend more time to read and may lead me a little confused.

Next time, If you could ,plz write as few words as you could .That would benefit us and save both us time.

 

BTW, Has your data been sorded by STORE_ID ? If it were, I think temporary array could make it faster than Hash Table.

 

 

data have;
input STORE_ID $ PRODUCT_ID $ PRODUCT_WIN_POINT1 PRODUCT_WIN_POINT2 MONTHLY_PERIOD $;
cards;
W169 F1 1 0 jan
W169 F1 0 1 feb
W169 F1 0 0 mar
W169 F1 1 1 apr
W169 F1 0 0 may
W169 F1 1 1 jun
W169 V1 0 1 jan
W169 V1 1 1 feb
W169 V1 1 0 mar
W169 V1 1 0 apr
W169 V1 0 1 may
W169 V1 0 1 Jun
W169 T1 0 1 jan
W169 T1 1 0 feb
W169 R1 0 1 mar
W169 R1 0 0 may
W169 R1 0 1 jun
B291 B3 1 0 jan
B291 B2 1 1 jun
B291 B44 0 1 july
;
run;
data _null_;
 if _n_ eq 1 then do;
 if 0 then set have;
   declare hash h(hashexp:20);
   h.definekey('STORE_ID','MONTHLY_PERIOD');
   h.definedata('STORE_ID','MONTHLY_PERIOD','COMPARED_PRODUCT_WIN_POINT1','COMPARED_PRODUCT_WIN_POINT2');
   h.definedone(); 
 end;
 set have end=last;
 if h.find()=0 then do;
   if COMPARED_PRODUCT_WIN_POINT1=1 or PRODUCT_WIN_POINT1=1 then COMPARED_PRODUCT_WIN_POINT1=1;
   if COMPARED_PRODUCT_WIN_POINT2=1 or PRODUCT_WIN_POINT2=1 then COMPARED_PRODUCT_WIN_POINT2=1;
   h.replace();
 end;
 else do;
   COMPARED_PRODUCT_WIN_POINT1=0;
   COMPARED_PRODUCT_WIN_POINT2=0;
   if PRODUCT_WIN_POINT1=1 then COMPARED_PRODUCT_WIN_POINT1=1;
   if PRODUCT_WIN_POINT2=1 then COMPARED_PRODUCT_WIN_POINT2=1;
   h.add();
 end;
if last then h.output(dataset:'want');
run;   
      
      

 

CharlotteCain
Quartz | Level 8

@Ksharp Good morning Xia, It's very nice to just wake up to your solution as always. I am really sorry for using too many words and please accept my apology for that. 

Yes, the data is sorted by store_id product_id and monthly_period(sorted by date and then i took month of date using month function) and the result should be in the order store_id monthy_period once the comparison is done for product_win_point variables.

 

Thank you as always, and I hope you are well?

 

Regards,

Charlotte

Ksharp
Super User

OK. If you have a DATE variable in your dataset. MONTHLY_PERIOD should be expressed as number 1 2 3 4 ... that would be better.

compare this way with Hash Table , and see which one is faster.

 

 

data have;
input STORE_ID $ PRODUCT_ID $ PRODUCT_WIN_POINT1 PRODUCT_WIN_POINT2 MONTH ;
cards;
W169 F1 1 0 1
W169 F1 0 1 2
W169 F1 0 0 3
W169 F1 1 1 4
W169 F1 0 0 5
W169 F1 1 1 6
W169 V1 0 1 1
W169 V1 1 1 2
W169 V1 1 0 3
W169 V1 1 0 4
W169 V1 0 1 5
W169 V1 0 1 6
W169 T1 0 1 1
W169 T1 1 0 2
W169 R1 0 1 3
W169 R1 0 0 5
W169 R1 0 1 6
X291 B3 1 0 1
X291 B2 1 1 6
X291 B44 0 1 7
;
run;
data want;
 set have;
 by STORE_ID;
 array x1{12} _temporary_;
 array x2{12} _temporary_;
 if first.STORE_ID then call missing(of x1{*} x2{*});

 if PRODUCT_WIN_POINT1=1 or x1{MONTH}=1 then x1{MONTH}=1;
  else x1{MONTH}=0;
 if PRODUCT_WIN_POINT2=1 or x2{MONTH}=1 then x2{MONTH}=1;
  else x2{MONTH}=0;
 if last.STORE_ID then do;
  do i=1 to dim(x1);
   if not missing(x1{i}) then do;
    MONTHLY_PERIOD=put(mdy(i,1,2000),monname3.);  /*Use proc format to map if you want faster*/
    MONTH_as_number=i;
    COMPARED_PRODUCT_WIN_POINT1=x1{i};
    COMPARED_PRODUCT_WIN_POINT2=x2{i};
    output;
   end;
  end;
 end;
keep STORE_ID MONTHLY_PERIOD MONTH_as_number COMPARED_PRODUCT_WIN_POINT1 COMPARED_PRODUCT_WIN_POINT2;
run;
 
CharlotteCain
Quartz | Level 8
Thank you Xia, You are too too too Brilliant!!. The SAS world envy you and in many cases I have noticed here, if nobody can solve a problem,only Xia Keshan can. Many people have said this and I say this too many times. You are blessed and gifted talent and thank you for being a true dear friend. You have helped me so much
Ksharp
Super User

Charlotte,

Thanks your compliment. That is reason why I stick with SAS Forum - I love solving the tough problem . more difficult ,bring me more fun and happiness.  I enjoy the experience of solving very difficult question.

 

 

Take Care.

Bye. I gonna have a vacation , and You gonna not see me in next a few days.

CharlotteCain
Quartz | Level 8

@Ksharp Yes Xia, I know you would be gone for Chinese new year holidays to your home town right? I remember you did the same last year 🙂 . I was anyway going to wish you by sending you a peronal greetings email to you, but anyway since you mentioned that I wish you a fantastic chinese new year that brings hope, joy, happiness and peace. Thank you for everything and enjoy your break!

 

I'll be anxiously waiting for your return. Take care and stay safe!,

Charlotte

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1074 views
  • 3 likes
  • 2 in conversation