SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Query

Reply
Regular Contributor
Posts: 167

DI Query

Hi ,

I would like to compare 2 attributes of  two tables A and B. If the attributes match, I need to update table A.

How do I do that with SAS DI studio?

Thanks in advance,

Regards,

Sheeba Swaminathan

Super User
Posts: 5,438

Re: DI Query

What do you mean by attribute?If you mean like table label or something, it sounds odd. There are no standard transform for this.  You can have user written code in the pre step of a table loader that check table attributes. If mismatch abort the job.

Data never sleeps
Regular Contributor
Posts: 167

Re: DI Query

Hi Linush,

I am sorry about that...It was a mistake...I meant column.

I would like to compare 2 Columns of  two tables A and B. If the column values match, I need to update table A.

How do I do that with SAS DI studio?

Regards,

Sheeba Swaminathan

Respected Advisor
Posts: 4,173

Re: DI Query

Which DIS version are you using? Starting with DIS4.4 there is a SQL Update transformation allowing you to update columns based on a subquery.

Regular Contributor
Posts: 167

Re: DI Query

Hi Patrick,

I am using DIS 4.6. I will try out the update transformation

Regards,

Sheeba Swaminathan

Respected Advisor
Posts: 4,173

Re: DI Query

Hi Sheeba

You must be very clear of how you want to load the table. Let me recap what I understand:

1. You have a source table with columns {groupid, element, pf}. These 3 columns do not uniquely identify a row (so can not be used as primary composite key).

2. You have a target table where you want to match your source data with the target over variable {element, pf}. If I'm correct with 1. then this is a many:many relationship.

3. You want generate a surrogate key based if there is no match over columns {element, pf}

That's kind of confusing.
- If you have column {groupid} in your source why can't you simply load it?
- A surrogate key is generated based on a unique composite business key - which you haven't specified yet. A surrogate key serves as primary key and it must be unique - so it can't be a "groupid".
- There is the concept of a retained key - but such a key is only used in the context of slowly changing dimensions.

How do you want to load your source data? All rows from source or only deltas. I'm only guessing here but I believe what you're after is to first load your data using a normal table loader with either update/insert - or a simple append if this is what you need to do.

If it's update/insert then you will need to determine the business key.

I'm not sure about the purpose of "groupid". It appears that a group is defined over {element, pf}. If you really need an extra variable for the group then just concatenate the 2 existing variables like:
groupid=cats(element,'|',pf); You can also use this to generate some hash value if you prefer this (look up the code generated in the SCD2 loader for the digest value if you want to go down this path).

Hope this is of some help
Patrick

Regular Contributor
Posts: 167

Re: DI Query

Hi Patrick,

I am sorry about the disconnect here. I wanted to rephrase my question to make it more clear with examples

I have a table A having following columns

1) line_id (PK) 

2)groupid 

3) element

4) pf

5) scope

6)pid

All i wanted here is to update table A's groupid column based on matching attributes in table B.

The structure of Table B  is

1) execkey(PK)

2)groupid

3)pf

4)scope

5)pid

The update condition here is

     if A.element = "ABC", then update A.groupid = (Select groupId from B where A.pf = b.pf)

     if A.element = "DEF", then update A.groupId =  (Select groupId from B where A.scope = B.scope and A.pid=B.pid)

Regards,

Sheeba Swaminathan

Super User
Posts: 5,438

Re: DI Query

I'm still not really getting the update rules here.

If you provide some sample input and desired output data, it might get clearer.

Data never sleeps
Regular Contributor
Posts: 167

Re: DI Query


Hi Linush,

Please find the sample input and output data below.

I have a table A having following columns


Table A
line_id (PK)    groupid        element        pf           scope         pid
     1                                      ABC           pf1           scope1     pid1
      2                                      DEF           pf1           scope1      pid1
     3                                      XYZ           pf2           scope1      pid2
     4                                      XXX           pf3           sope2      pid3
     5                                      YYY           pf4           scope2      pid4


Table B 

groupid       pf      scope             pid

201           pf1      scope1          pid1
202            pf1      scope1           pid1
201           pf2      scope1           pid2
                   pf3     scope2           pid3
203             pf4      scope2           pid4

The update condition here is

     if A.element = "ABC", then update A.groupid = (Select groupId from B where A.pf = b.pf)

    if A.element = "DEF", then update A.groupId =  (Select groupId from B where A.scope = B.scope and A.pid=B.pid)

My aim is to update A comparing the values of A and B.

Initially Table A does not have group idd. After the transformation A would look like

ouptut Table A
line_id (PK)    groupid  element        pf           scope         pid
     1                                      ABC      pf1           scope1     pid1
      2                                     DEF      pf1           scope1      pid1
     3                                      XYZ      pf2           scope1      pid2
     4                                      ABC      pf3           scope2      pid3
     5                                      DEF      pf4           scope2      pid4

1) for the first record A.element = ABC and A.pf= B.pf so I need to assign A.groupid=B.group id
2) for the second record A.element='DEC' then do;
  if(A.scope=B.scope and A.pid=B.pid ) then A.groupid=B.groupid.

Respected Advisor
Posts: 4,173

Re: DI Query

Your data does not support the logic you want to implement.

"  if A.element = "ABC", then update A.groupid = (Select groupId from B where A.pf = b.pf)"

The sub-select "Select groupId from B where A.pf = b.pf" returns 2 matching rows from table B with 2 different groupid's (201, 202). So which groupid should it be?

You also want to apply different matching logic based on values of "Element". It appears that this is not table driven but would need the use of hard coded values in your code (which is "ugly"). Also you haven't told us what should happen with "element='XYZ' ".

But first: You need a relationship between your tables suitable for the logic you want to implement - or you have to amend your logic (eg. if there are multiple groupid's possible then take the highest value).

Regular Contributor
Posts: 167

Re: DI Query

Hi Patrick,

Thanks for the reply.

Apologies for incorrect data.

.....group id can have only unique values or null values.....Also if the element is xyz the condition is different. Yes here i need to harcode the values.

Table A
line_id (PK)    groupid        element        pf           scope         pid
     1                                      ABC           pf1           scope1     pid1
      2                                      DEF           pf1           scope1      pid1
     3                                      XYZ           pf2           scope1      pid2
     4                                      ABC          pf3           sope2      pid3    
     5                                      DEF           pf4           scope2      pid4


Table B

groupid       pf      scope             pid

201           pf1      scope1          pid1
202            pf1      scope1           pid1
203           pf2      scope1           pid2
                   pf3     scope2           pid3
204             pf4      scope2           pid4

The update condition here is

     if A.element = "ABC", then update A.groupid = (Select groupId from B where A.pf = b.pf)

    if A.element = "DEF", then update A.groupId =  (Select groupId from B where A.scope = B.scope and A.pid=B.pid)

My aim is to update A comparing the values of A and B.

Initially Table A does not have group idd. After the transformation A would look like

ouptut Table A
line_id (PK)    groupid  element        pf           scope         pid
     1                                      ABC      pf1           scope1     pid1
      2                                     DEF      pf1           scope1      pid1
     3                                      XYZ      pf2           scope1      pid2
     4                                      ABC      pf3           scope2      pid3
     5                                      DEF      pf4           scope2      pid4

1) for the first record A.element = ABC and A.pf= B.pf so I need to assign A.groupid=B.group id
2) for the second record A.element='DEC' then do;
  if(A.scope=B.scope and A.pid=B.pid ) then A.groupid=B.groupid.

3) for the third record A.element='YZ' then do;

                    if(A.scope=B.scope ) then A.groupid=B.groupid.

4)

Respected Advisor
Posts: 4,173

Re: DI Query

You're missing the point and the data you've posted does still not support your logic.

I suggest you post the 3rd result table with the values for "groupid" populated. May be when you try to pick the values from Table B you will realise where the issue is.

For the first row in Table A and using your logic there are 2 matching rows in Table B (all rows where b.pf='pf1) so there are 2 possible values for groupid. So which one would you pick to populate your 3rd table (the result after the update)?

Regular Contributor
Posts: 167

Re: DI Query

Hi Patrick,

I got your point. I also want to specify that I need to update table A based on comparison....My result table is again going to be A. No new table needs to be created.

I am reposting the corrected sample data below.....

Table A

line_id (PK)    groupid        element        pf           scope         pid

     1                                      ABC           pf1           scope1     pid1

      2                                      DEF           pf2           scope2      pid2

     3                                      XYZ           pf3           scope3      pid3


Table B

groupid       pf      scope             pid     element

201           pf1      scope1          pid1         ABC
202            pf2      scope2           pid2       DEF
203           pf3      scope3          pid3         XYZ
204                  pf4     scope4     pid4       XYZ


) for the first record A.element = ABC and A.pf= B.pf so I need to assign A.groupid=B.group id
2) for the second record A.element='DEF' then do;
  if(A.scope=B.scope and A.pid=B.pid ) then A.groupid=B.groupid.

3) for the third record A.element='XYZ' then do;

                    if(A.scope=B.scope ) then A.groupid=B.groupid.

Output TABLE A

line_id (PK)    groupid        element        pf           scope         pid

     1                201                      ABC           pf1           scope1     pid1

      2               202                       DEF           pf2           scope2      pid2

     3                203                      XYZ           pf3           scope3      pid3

Respected Advisor
Posts: 4,173

Re: DI Query

O.K. - your logic is now going to work with exactly the sample data you've provided. Here the code:

data tableA;
  length groupid 8;
  if 0 then call missing(groupid);
  input line_id (element pf scope pid) (:$8.);
  datalines;
1 ABC pf1 scope1 pid1
2 DEF pf2 scope2 pid2
3 XYZ pf3 scope3 pid3
;
run;

data tableB;
  input groupid (pf scope pid element) (:$8.);
  datalines;
201 pf1 scope1 pid1 ABC
202 pf2 scope2 pid2 DEF
203 pf3 scope3 pid3 XYZ
204 pf4 scope4 pid4 XYZ
;
run;

proc sql;
  update tableA as A
    set groupid =
      (select groupid from tableB as B
        where
          (A.element='ABC' and A.pf=B.pf)
          or (A.element='DEF' and A.scope=B.scope and A.pid=B.pid )
          or (A.element='XYZ' and A.scope=B.scope)
      )
  ;
quit;

Now with DIS you can use the SQL Update transformation. I don't have access to DIS right now but it shouldn't be too hard to set it up in a way that it generates code as above in the SQL Update statement.

Regular Contributor
Posts: 167

Re: DI Query

Thanks a lot Patrick.

I will try this out and post the result.

Regards,

Sheeba Swaminathan

Ask a Question
Discussion stats
  • 16 replies
  • 809 views
  • 0 likes
  • 3 in conversation