Efficient sorting when some variables are already sorted

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

Efficient sorting when some variables are already sorted

Hi,

I have a dataset containing for each store and date the IDs of different product types (Product_ID) and the amount of sold units (Product_sold). The data are already sorted by store and date.

My tasks are to

* consolidate sales for identical product types which were sold in the same store and on the same day (done in the first PROC SQL step in the example below)

* insert columns having the information sorted by product type in ascending order (Product_ID_asc Product_sold_asc) and columns having the information sorted in descending order (Product_ID_desc Product_sold_desc) for each store and date.

I'd really like to know if there are better solutions than the (pretty slow) one I posted below. The original dataset is A and A2 represents the desired result.

1) Dataset A is very big. Is it really necessary to create a second table (A2) in order to consolidate sales (first PROC SQL step)?

2) Aren't there more efficient ways that take advantage of the fact that the data is already sorted by store and date? I still mention both variables after group by/sort by.

Thanks. I really appreciate any suggestions.

Data A;

   Format

      Store $4.

      Date date11.

      Product_ID $6.

      Product_sold best2.;

   Input

      Store :$4.

      Date :date11.

      Product_ID :$6.

      Product_sold best2.;

Datalines;

West 19-FEB-2013 Type_B 9

West 19-FEB-2013 Type_A 9

West 19-FEB-2013 Type_B 11

West 20-FEB-2013 Type_C 5

West 20-FEB-2013 Type_A 6

West 20-FEB-2013 Type_B 7

West 20-FEB-2013 Type_A 10

Run;

Proc Sql;

   Create Table A2

   As Select

      Store, Date, Product_ID, SUM(Product_sold) AS Product_sold

   From

      A

   Group By

      Store, Date, Product_ID;

Quit;

Proc Sort

   Data=A2 Out=A2Product_asc

   (

      Keep=Product_ID Product_sold

      Rename=(Product_ID=Product_ID_asc Product_sold=Product_sold_asc)

   );

   By Store Date Product_ID;

Run;

Proc Sort

   Data=A2 Out=A2Product_desc

   (

      Keep=Product_ID Product_sold

      Rename=(Product_ID=Product_ID_desc Product_sold=Product_sold_desc)

   );

   By Store Date Descending Product_ID;

Run;

Data A2;

   Set A2(Keep=Store Date);

   Set A2Product_asc;

   Set A2Product_desc;

Run;


Accepted Solutions
Solution
‎07-25-2013 10:42 PM
Super User
Posts: 5,495

Re: Efficient sorting when some variables are already sorted

Posted in reply to Georg_UPB

I would guess that PROC SUMMARY would be faster than PROC SQL:

proc summary data=A nway;

  by store date;

  class product_id;;

  var product_sold;

  output out=a2 (keep=store date product_id product_sold_asc rename=(product_id=product_id_asc)) sum=product_sold_asc;

run;

Of course, there's just one way to be sure which would be faster.  And this does require the data to be in sorted order BY STORE DATE (which you had indicated was the case).

As RichardinOz pointed out, the result will be sorted already (regardless of whether you use SQL or SUMMARY), and you only need two data sets not three.

Good luck.

View solution in original post


All Replies
Super Contributor
Posts: 644

Re: Efficient sorting when some variables are already sorted

Posted in reply to Georg_UPB

Your first sort is redundant (omit it!) because GROUP BY in SAS will generate a sorted table.  To be certain, you could insert and ORDER BY clause.

Data A2;

   Set A2 (Rename = (Product_Sold = Product_Sold_Asc ) );

   Set A2Product_desc;

Run;

Richard

Solution
‎07-25-2013 10:42 PM
Super User
Posts: 5,495

Re: Efficient sorting when some variables are already sorted

Posted in reply to Georg_UPB

I would guess that PROC SUMMARY would be faster than PROC SQL:

proc summary data=A nway;

  by store date;

  class product_id;;

  var product_sold;

  output out=a2 (keep=store date product_id product_sold_asc rename=(product_id=product_id_asc)) sum=product_sold_asc;

run;

Of course, there's just one way to be sure which would be faster.  And this does require the data to be in sorted order BY STORE DATE (which you had indicated was the case).

As RichardinOz pointed out, the result will be sorted already (regardless of whether you use SQL or SUMMARY), and you only need two data sets not three.

Good luck.

Contributor
Posts: 38

Re: Efficient sorting when some variables are already sorted

Posted in reply to Astounding

Thank you for your help! Your suggestions improved the code a lot!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 252 views
  • 3 likes
  • 3 in conversation