Solved: Re: How to workout keep track the variable

sagulolo · Posted 08-01-2019 06:33 AM

Hi all,

i have 2 table which is Table_A and Table_B, Table_A extra number will always sufficient to cover the short number at Table_B. The logic will be use the extra number from Table A to knock off the first short number from Table B with to the same group(Sg_a = Sg_b) until zero then follow by second short number from the same group. Please refer below:-

data Table_A;
  infile datalines dlm='|';
  input  Sg_a $ Key_a $ extra;
;
datalines;
A |KEY1  |50
A |KEY2  |80
B |KEY6  |1000
C |KEY9  |10
C |KEY10 |5
C |KEY11 |500
;

data Table_B;
  infile datalines dlm='|';
  input  Sg_b $ Key_b $ short;
;
datalines;
A |KEY3  |10
A |KEY4  |25
A |KEY5  |35
B |KEY7  |200
B |KEY8  |700
C |KEY12 |20
C |KEY13 |400
C |KEY14 |10
C |KEY15 |50
;

proc sql;
	create table rightjoin as
	select a.*, b.*
	from Table_A a right join
	Table_B b
	on a.Sg_a = b.Sg_b;
quit;

So far, I have tried to right join table A to table B, but have no idea how to evaluate and keep track the actual_transfer, below is my desire output.

Sg_a	Key_a	Key_b	actual_transfer
A	KEY1	KEY3	10
A	KEY1	KEY4	25
A	KEY1	KEY5	15
A	KEY2	KEY5	20
B	KEY6	KEY7	200
B	KEY6	KEY8	700
C	KEY9	KEY12	10
C	KEY10	KEY12	5
C	KEY11	KEY12	5
C	KEY11	KEY13	400
C	KEY11	KEY14	10
C	KEY11	KEY15	50

Appreciate your time and help on this matter.

hashman · Posted 08-05-2019 06:05 PM

@sagulolo:

A curious puzzle. Frankly, I've spent about 90% of my time having a crack at it by trying to understand your spec. However, it's commendable that you've tried harder every next iteration, and finally it's helped. Once the pattern was clear, writing the code was fairly easy. One selection that had to be made was to decide on the type of a lookup table for the info from Table_A, and I have finally opted, somewhat contrary to my moniker, for using arrays - primarily since they are simple to update in place.

Here's the code (the input tables are assumed sorted by SG_A and SG_B, respectively):

data Table_A ;                                                                 
  input  Sg_a $ Key_a $ extra ;                                                
  cards ;                                                                      
A KEY1    50                                                                   
A KEY2    80                                                                   
B KEY6  1000                                                                   
C KEY9    10                                                                   
C KEY10   05                                                                   
C KEY11  500                                                                   
run ;                                                                          
                                                                               
data Table_B;                                                                  
  input  Sg_b $ Key_b $ short ;                                                
  cards ;                                                                      
A KEY3   10                                                                    
A KEY4   25                                                                    
A KEY5   35                                                                    
B KEY7  200                                                                    
B KEY8  700                                                                    
C KEY12  20                                                                    
C KEY13 400                                                                    
C KEY14  10                                                                    
C KEY15  50                                                                    
run ;                                                                          
                                                                               
proc sql noprint ;                                                             
  select max (d) into :d from (select count (*) as d from table_a group sg_a) ;
quit ;                                                                         
                                                                               
data want (keep = sg_a key_a key_b balance_extra actual_transfer) ;            
  array x [&d]   8 _temporary_ ;                                               
  array k [&d] $ 8 _temporary_ ;                                               
  do _i = 1 by 1 until (last.sg_a) ;                                           
    set table_a (in = _a) table_b (rename = sg_b = sg_a);                      
    by sg_a ;                                                                  
    if _a then do ;                                                            
      x [_i] = extra ;                                                         
      k [_i] = key_a ;                                                         
      _im = _i ;                                                               
    end ;                                                                      
    else do _i = 1 by 0 until (short = 0 or _i > _im) ;                        
      actual_transfer = x [_i] min short ;                                     
      if actual_transfer > 0 then do ;                                         
        key_a = k [_i] ;                                                       
        balance_extra = x [_i] - actual_transfer ;                             
        output ;                                                               
      end ;                                                                    
      x [_i] +- actual_transfer ;                                              
      short  +- actual_transfer ;                                              
      if x [_i] = 0 then _i + 1 ;                                              
    end ;                                                                      
  end ;                                                                        
run ;

The preliminary SQL step is used to size their upper bound at the expense of an extra pass through Table_A. Alternatively, you can omit the step and instead of setting the upper bounds of the arrays X and K to &D, set them to something "big enough" like 99999. The only requirement to its magnitude is that it must be equal to or exceed the size of the largest BY group by SG_A in Table_A. Yet another alternative is to use a hash table instead of the arrays; but in this case methinks that compared to the arrays, it would introduce an extra level of complexity with regard to accessing the lookup items and updating them in place.

On the different note, I don't see how this nut can be cracked without procedural programming and by using SQL alone.

Kind regards

Paul D.

View solution in original post

KachiM · Posted 08-01-2019 07:13 AM

The logic will be use the extra number from Table A to knock off the first short number from Table B with to the same group(Sg_a = Sg_b) until zero then follow by second short number from the same group. Please refer below:-

Can you explain how you got "actual_transfer" using sg_a and sg_b? I am bad to understand your term "to knock off".

sagulolo · Posted 08-01-2019 07:23 AM

Hi datasp,

Thank for your time and sorry for my bad english, please refer below:-

A KEY1(table A) --> extra 50 - A KEY3(table B) short 10 (50 - 10, can offset 10), so i get below:-

so i output A KEY1(table A) A KEY3(table B) 10, then A KEY1(table A) -> still have extra $40

A KEY1(table A) --> extra 40 - A KEY4(table B) short 25 (40 - 25, can offset 25)

so i output A KEY1(table A) A KEY4(table B) 25, then A KEY1(table A) -> still have extra 15

A KEY1(table A) --> extra 15 - A KEY5(table B) short 35 (15 - 35, only can offset 15)

so i output A KEY1(table A) A KEY5(table B) 15, then next i refer to A KEY2 80 for the balance of A KEY5(table B) which is 20

A KEY2 (table A) -->extra 80 - A KEY5(table B) short 20 ( 80 - 20, can offset 20)

so i output A KEY2 (table A) A KEY5(table B) short 20

hope the above clarifies.

sagulolo · Posted 08-01-2019 08:45 AM

Hi datasp,

Apologize as didn't really answer your question.
"actual_transfer" is refer to sg_b if table A extra more than sg_b short. Else will be sg_a balance left.

Once the table B short all deducted then move to next B then C.

KachiM · Posted 08-01-2019 10:47 AM

I understand for the first A of Table 1 and you cross compare EXTRA with SHORT. This goes like 10, 25, and final residual (15) replaces 35 of SHORT.

Once the table B short all deducted then move to next B then C.

What is happening the next A of Table1? Is it to be ignored and go to B of Table 1?

You show the output on your revised specification. In your output append comment in the form of Balance_Extra = 50 - 10, 40 - 25 , 15 - 15 for each row. It will be easy to know what you want.

sagulolo · Posted 08-02-2019 09:42 AM

Hi datasp,

Apologize for late reply.

What is happening the next A of Table1?

no action require as Table_B no more A group

Is it to be ignored and go to B of Table 1?

Yes, next will be cross match between Table_A and Table_B for B group.

Added balance_extra into table for more understanding, thank you

Sg_a	Key_a	Key_b	Balance_Extra	actual_transfer
A	KEY1	KEY3	50 -10 = 40	10
A	KEY1	KEY4	40 - 25 = 15	25
A	KEY1	KEY5	15 - 15 = 0(Key 1 no more number, next use Key 2)	15
A	KEY2	KEY5	80 - 20, stop as table_B no more group for A	20
B	KEY6	KEY7	1000 - 200 = 800	200
B	KEY6	KEY8	800 - 700 = 100, stop as table_B no more group for B	700
C	KEY9	KEY12	10 - 10 = 0, (Key 9 no more number, next use Key 10)	10
C	KEY10	KEY12	5 - 5 = 0, (Key 10 no more number, next use Key 11)	5
C	KEY11	KEY12	500 - 5 = 495	5
C	KEY11	KEY13	495 - 400 = 95	400
C	KEY11	KEY14	95 - 10 = 85	10
C	KEY11	KEY15	85 - 50 = 35, stop for as table_B no more group for C	50

KachiM · Posted 08-02-2019 10:50 AM

It is good now. I want the 'desired output'. Place the revised output.

sagulolo · Posted 08-02-2019 12:17 PM

Hi datasp,

The desire output will be as per below:-

Sg_a	Key_a	Key_b	actual_transfer
A	KEY1	KEY3	10
A	KEY1	KEY4	25
A	KEY1	KEY5	15
A	KEY2	KEY5	20
B	KEY6	KEY7	200
B	KEY6	KEY8	700
C	KEY9	KEY12	10
C	KEY10	KEY12	5
C	KEY11	KEY12	5
C	KEY11	KEY13	400
C	KEY11	KEY14	10
C	KEY11	KEY15	50

Thank you

KachiM · Posted 08-02-2019 11:11 AM

Sg_a	Key_a	Key_b	Balance_Extra	actual_transfer
A	KEY1	KEY3	50 -10 = 40	10
A	KEY1	KEY4	40 - 25 = 15	25
A	KEY1	KEY5	15 - 15 = 0(Key 1 no more number, next use Key 2)	15
A	KEY2	KEY5	80 - 20, stop as table_B no more group for A	20
B	KEY6	KEY7	1000 - 200 = 800	200
B	KEY6	KEY8	800 - 700 = 100, stop as table_B no more group for B	700
C	KEY9	KEY12	10 - 10 = 0, (Key 9 no more number, next use Key 10)	10
C	KEY10	KEY12	5 - 5 = 0, (Key 10 no more number, next use Key 11)	5
C	KEY11	KEY12	500 - 5 = 495	5
C	KEY11	KEY13	495 - 400 = 95	400
C	KEY11	KEY14	95 - 10 = 85	10
C	KEY11	KEY15	85 - 50 = 35, stop for as table_B no more group for C	50

5th row in 500 - 5, how do you get this 5? See In Table_B, k12 is 20.

sagulolo · Posted 08-02-2019 12:33 PM

Hi datasp,

5th row in 500 - 5, how do you get this 5? See In Table_B, k12 is 20.

I get the output 5 is because

20 - 10 - 5 (KEY12 - KEY9 - KEY10) = 5, so need another 5 from KEY 11 to become zero.

Thank you

KachiM · Posted 08-02-2019 01:28 PM

@sagulolo

Do you insist on SQL solution? I have no sql-head. I am trying Data Step approach.

There are problems in getting :

A Key2 Key5 20 as Key5 has been used in the previous row of output.

Similarly, with:

C Key9 Key12 10

C Key10 Key12 5

C Key11 Key12 5

I am looking into this. Be patient.

sagulolo · Posted 08-02-2019 01:30 PM

Hi datasp,

I'm okay with the data step approach and really thank for looking into this matter.

Thank you so much

hashman · Posted 08-05-2019 06:05 PM

@sagulolo:

A curious puzzle. Frankly, I've spent about 90% of my time having a crack at it by trying to understand your spec. However, it's commendable that you've tried harder every next iteration, and finally it's helped. Once the pattern was clear, writing the code was fairly easy. One selection that had to be made was to decide on the type of a lookup table for the info from Table_A, and I have finally opted, somewhat contrary to my moniker, for using arrays - primarily since they are simple to update in place.

Here's the code (the input tables are assumed sorted by SG_A and SG_B, respectively):

data Table_A ;                                                                 
  input  Sg_a $ Key_a $ extra ;                                                
  cards ;                                                                      
A KEY1    50                                                                   
A KEY2    80                                                                   
B KEY6  1000                                                                   
C KEY9    10                                                                   
C KEY10   05                                                                   
C KEY11  500                                                                   
run ;                                                                          
                                                                               
data Table_B;                                                                  
  input  Sg_b $ Key_b $ short ;                                                
  cards ;                                                                      
A KEY3   10                                                                    
A KEY4   25                                                                    
A KEY5   35                                                                    
B KEY7  200                                                                    
B KEY8  700                                                                    
C KEY12  20                                                                    
C KEY13 400                                                                    
C KEY14  10                                                                    
C KEY15  50                                                                    
run ;                                                                          
                                                                               
proc sql noprint ;                                                             
  select max (d) into :d from (select count (*) as d from table_a group sg_a) ;
quit ;                                                                         
                                                                               
data want (keep = sg_a key_a key_b balance_extra actual_transfer) ;            
  array x [&d]   8 _temporary_ ;                                               
  array k [&d] $ 8 _temporary_ ;                                               
  do _i = 1 by 1 until (last.sg_a) ;                                           
    set table_a (in = _a) table_b (rename = sg_b = sg_a);                      
    by sg_a ;                                                                  
    if _a then do ;                                                            
      x [_i] = extra ;                                                         
      k [_i] = key_a ;                                                         
      _im = _i ;                                                               
    end ;                                                                      
    else do _i = 1 by 0 until (short = 0 or _i > _im) ;                        
      actual_transfer = x [_i] min short ;                                     
      if actual_transfer > 0 then do ;                                         
        key_a = k [_i] ;                                                       
        balance_extra = x [_i] - actual_transfer ;                             
        output ;                                                               
      end ;                                                                    
      x [_i] +- actual_transfer ;                                              
      short  +- actual_transfer ;                                              
      if x [_i] = 0 then _i + 1 ;                                              
    end ;                                                                      
  end ;                                                                        
run ;

The preliminary SQL step is used to size their upper bound at the expense of an extra pass through Table_A. Alternatively, you can omit the step and instead of setting the upper bounds of the arrays X and K to &D, set them to something "big enough" like 99999. The only requirement to its magnitude is that it must be equal to or exceed the size of the largest BY group by SG_A in Table_A. Yet another alternative is to use a hash table instead of the arrays; but in this case methinks that compared to the arrays, it would introduce an extra level of complexity with regard to accessing the lookup items and updating them in place.

On the different note, I don't see how this nut can be cracked without procedural programming and by using SQL alone.

Kind regards

Paul D.

sagulolo · Posted 08-06-2019 10:59 AM

@hashman

Thank you so much for the coding, this is what I need and I never though of array can solve this puzzle. Now I have to test it at my real data, nevertheless apologize for my bad english. I will update here again after I test it with the real data.

@KachiM

Really appreciate your giving comments on the lack of logic explanations and guiding me to write a explanation to get the desire result. I also don't mind if you share others solution as I still new and lack of logical mind to resolve issue/puzzle like this.

Best regards

KachiM · Posted 08-06-2019 11:25 AM

@hashman

I have spent several hours on solving this problem. I am still testing another data step approach. With struggles I had, I will say yours is a program of excellence. It must be a case study to every aspiring Data Step Programmers.

@sagulolo

Initially I thought it is simply a many to many and one to many merging problem. There is a twist. In the middle of Table_B, we need to go to Table_A to get some info and return to the place in Table_B to continue. This jump_return-continue is the crucial part of programming. This is first time for me.

Enjoy the array solution given by Paul D. If you have Data step debugger and if you like, understand the flow of the program and visualize the values of variables.

best wishes

DataSP

Classroom Training Available!