DATA Step, Macro, Functions and more

RE: Counting using 2 datasets

Accepted Solution Solved
Reply
Regular Contributor
Posts: 229
Accepted Solution

RE: Counting using 2 datasets

Hi...I have a data set A that has a unique product id numbers and the corresponding benchmark weight for that product id number. A second data set B contains the raw data of which I want to count for each client and product id number, the number of records (sales) below the benchmark weight, the number of records (sales) having the benchmark weight (equal to) and also the number of records (sales) above the benchmark weight. The problem I am having is how to bring those benchmark weights in so I can compare the actual weights in data set B to the corresponding product weights. Any suggestions. Thanks,


Accepted Solutions
Solution
‎02-17-2017 07:50 PM
Frequent Contributor
Posts: 75

Re: RE: Counting using 2 datasets

You can combine the four SQL statements into one and then do your transpose:

 

PROC SQL;

CREATE TABLE Summary AS

SELECT B.Client, B.Product_ID, A.Weight as BenchMarkWt,

case when a.Weight-b.Weight>0 then 'Greater than'

when a.Weight-b.Weight<0 then 'Less than'

when a.Weight-b.Weight=0 then 'Same'

else '' end as Condition, count(*) as COUNT_of_Condition

FROM WORK.trialB b LEFT JOIN WORK.trialA a

ON (B.Product_ID = A.Product_ID)

group by B.Client, B.Product_ID, A.Weight, calculated Condition;

QUIT;

 

PROC TRANSPOSE DATA=Summary OUT=WORK.SummaryTrans(DROP=_NAME_);

BY Client Product_ID;

ID Condition;

VAR COUNT_of_Condition;

RUN;

 

Not sure if that's what you needed, though.

View solution in original post


All Replies
PROC Star
Posts: 7,492

Re: RE: Counting using 2 datasets

It would help seeing example datasets for each of your datasets (in datastep form), the result you'd like to get from the process, and the code you've tried thus far.

 

Art, CEO, AnalystFinder.com

 

Regular Contributor
Posts: 229

Re: RE: Counting using 2 datasets

Hi....this is what I have so far. Although the output in table Summary5 is what I would like to end up with, is there a more simplier and efficient way to end up with the same results as the actual dataset for trialb is quite large.

 

 

DATA trialA;

   LENGTH Product_ID $ 6 Weight 8;

   INFORMAT Product_ID $CHAR6. Weight BEST12.;

   INPUT Product_ID : $CHAR6. Weight : BEST32.;

DATALINES;

007075 100

076060 120

131630 40

342993 56

408007 91

487707 35

591680 40

605309 22

632145 56

785850 22

845866 105

;;;;

 

 

DATA trialb;

   LENGTH Client $ 9 Product_ID $ 6 Weight 8;

   INFORMAT Client $CHAR9. Product_ID $CHAR6. Weight BEST12.;

   INPUT Client : $CHAR9. Product_ID : $CHAR6. Weight : BEST32.;

DATALINES;

100087540 007075 85

100087540 007075 110

100087540 007075 100

100087540 007075 95

100087540 007075 105

100087540 408007 25

100087540 408007 30

100087540 408007 40

100087540 408007 35

100092380 605309 20

100092380 605309 25

100092380 605309 22

100092380 605309 23

100092380 785850 18

100092380 785850 23

100092380 785850 24

100092380 785850 19

114811568 342993 55

114811568 845866 110

114811568 845866 105

;;;;

 

PROC SQL NOPRINT;

     CREATE TABLE Summary1 AS

           SELECT

                trialB.Client,

         trialB.Product_ID,

         trialB.Weight,

         trialA.Product_ID AS Product_ID1,

         trialA.Weight AS Weight1

          FROM WORK.trialB LEFT JOIN WORK.trialA ON (trialB.Product_ID = trialA.Product_ID);

QUIT;

 

PROC SQL NOPRINT;

     CREATE TABLE Summary2 AS

           SELECT

                Summary1.Client,

         Summary1.Product_ID,

         Summary1.Weight,

         Summary1.Product_ID1,

         Summary1.Weight1,

           (Summary1.Weight1-Summary1.Weight) AS Difference

          FROM WORK.Summary1;

QUIT;

 

PROC SQL NOPRINT;

     CREATE TABLE Summary3 AS

           SELECT

                Summary2.Client,

           Summary2.Product_ID,

           Summary2.Weight,

         Summary2.Product_ID1,

         Summary2.Weight1,

         Summary2.Difference,

                (case when Summary2.Difference<0

                    then 'Less than'

               when Summary2.Difference=0

                    then 'Same'

               when Summary2.Difference>0

                    then 'Greater than'

               else ' '

               end) AS Condition

          FROM WORK.Summary2;

QUIT;

 

PROC SQL NOPRINT;

     CREATE TABLE Summary4 AS

           SELECT DISTINCT

                Summary3.Client,

         Summary3.Product_ID,

         Summary3.Condition,

          (COUNT(Summary3.Condition)) AS COUNT_of_Condition

          FROM WORK.Summary3

          GROUP BY Summary3.Client, Summary3.Product_ID, Summary3.Condition

          ORDER BY Summary3.Client, Summary3.Product_ID;

QUIT;

 

PROC TRANSPOSE DATA=Summary4 OUT=WORK.Summary5(DROP=_NAME_);

     BY Client Product_ID;

     ID Condition;

     VAR COUNT_of_Condition;

RUN;

QUIT;

Solution
‎02-17-2017 07:50 PM
Frequent Contributor
Posts: 75

Re: RE: Counting using 2 datasets

You can combine the four SQL statements into one and then do your transpose:

 

PROC SQL;

CREATE TABLE Summary AS

SELECT B.Client, B.Product_ID, A.Weight as BenchMarkWt,

case when a.Weight-b.Weight>0 then 'Greater than'

when a.Weight-b.Weight<0 then 'Less than'

when a.Weight-b.Weight=0 then 'Same'

else '' end as Condition, count(*) as COUNT_of_Condition

FROM WORK.trialB b LEFT JOIN WORK.trialA a

ON (B.Product_ID = A.Product_ID)

group by B.Client, B.Product_ID, A.Weight, calculated Condition;

QUIT;

 

PROC TRANSPOSE DATA=Summary OUT=WORK.SummaryTrans(DROP=_NAME_);

BY Client Product_ID;

ID Condition;

VAR COUNT_of_Condition;

RUN;

 

Not sure if that's what you needed, though.

PROC Star
Posts: 7,492

Re: RE: Counting using 2 datasets

[ Edited ]

I'd do all of the heavy lifting in your first sql step, then use proc freq, then transpose. e.g. (which produces your summary5 table). However, while I won't delete my post, I like @nehalsanghvi's solution better (i.e., simply including the count variable in your one proc sql step):

 

PROC SQL NOPRINT;
     CREATE TABLE Summary AS
           SELECT
                trialB.Client,
         trialB.Product_ID,
         case 
           when (trialB.Weight-trialA.Weight) > 0 then 'Less_than'
           when (trialB.Weight-trialA.Weight) < 0 then 'Greater_than'
           when (trialB.Weight-trialA.Weight) = 0 then 'Same'
else '' end as condition FROM WORK.trialB LEFT JOIN WORK.trialA ON (trialB.Product_ID = trialA.Product_ID); QUIT; proc freq data=Summary; tables client*product_ID*condition/out=need (drop=percent); run; proc transpose data=need out=want (DROP=_:); BY Client Product_ID; ID Condition; VAR COUNT; RUN;

HTH,

Art, CEO, AnalystFinder.com

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 149 views
  • 2 likes
  • 3 in conversation