BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
larsc
Obsidian | Level 7

Dear SAS community,

 

I have two source tables which I'd like to combine into one dataset. I'd like to perform a check based on some conditions which'll decide whether I select the records from Table A or the records Table B. The conditions will be based on the maximum depth for a group across the two tables and the number of records within the group in events where the maximum depth is equal across the two tables. With everything being equal, I prefer to use the records from Table A.

 

For demonstration purposes, let's say I have these tables:

 

Table A:                                

Name   Depth  Coord_X   Coord_Y

A          0           5               6                        

A          100       5               6                     

A          200       5               6                         

A          300       5               6      

B          0           4               3                         

B          2000     4               3

B          3000     4               3                          

B          4000     4               3                         

C          0           2               1                         

C          3000     2               1                        

C          4000     2               1                        

C          5000     2               1       

 

Table B

Name   Depth  Coord_X   Coord_Y

A          0           5               6                                         

A          250       5               6                         

A          300       5               6      

B          0           4               3                         

B          2000     4               3 

B          3000     4               3                        

B          4000     4               3                         

C          0           2               1                         

C          3000     2               1                        

C          4000     2               1                        

C          5000     2               1

C          5500     2               1

D          0           11             12

D          10         11             12

D          20         11             12 

 

As can be seen, the tables contains the same attributes. However, Table A may contain some groups not in Table B, and vice versa. This can be seen in the example above where Table B contains a group D not in Table A.

 

First, I'd like to test on the maximum depth - I want to select the records from the input table which contains the greatest depth. Using the tables above as an example, for group C, I'd like my target table to contain the records from Table B as this has a greater maximum depth then Table A. The maximum depth is the key condition, and as long as a table contains a greater maximum depth then the other table, the number of records within each group is irrelevant, and I always select the records from the source containing the maximum depth. In other words, even if Table A contained 1000 records and Table B contained 4 records, as long as Table B contains a greater maximum depth I select the 4 records from Table B over the 1000 records from Table A.

 

Secondly, if the two tables contains an equal maximum depth for a group, I use the number of records within each group as a tie breaker. Using the above tables as an example, for group A, I want to select the records from Table A as this contains 4 records compared to the 3 records in Table B with the maximum depth being equal between the two tables.

 

In the event that both the maximum depth and the number of records are equal, I want to use Table A as my source.

 

Overall, based on these conditions, my target table should look like this. Notice that I want to add a column which tells which source has been selected for my Target table:

 

Target Table

Name   Depth  Coord_X   Coord_Y   Source

A          0           5               6                 Table A

A          100       5               6                 Table A

A          200       5               6                 Table A        

A          300       5               6                 Table A

B          0           4               3                 Table A        

B          2000     4               3                 Table A

B          3000     4               3                 Table A       

B          4000     4               3                 Table A       

C          0           2               1                 Table B        

C          3000     2               1                 Table B       

C          4000     2               1                 Table B       

C          5000     2               1                 Table B

C          5500     2               1                 Table B

D          0           11             12               Table B

D          10         11             12               Table B

D          20         11             12               Table B

 

I am fairly new to SAS and programming in general, and I am therefore unsure what the best strategy to solving this issue efficiently would be. 

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

I think this is what you are after.

Please check it with great diligence as I haven't done any quality control. 

data Table_A;                                
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          100       5               6
A          200       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
;
run;

data Table_B;
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          250       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
C          5500      2               1
D          0         11             12
D          10        11             12
D          20        11             12
;
run;

PROC SQL noprint;
 create table Table_A_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_A
 group by Name;
 create table Table_B_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_B
 group by Name;
QUIT;

data Table_AB_MAX_Depth;
 set Table_A_MAX_Depth 
     Table_B_MAX_Depth indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),1,7);
run;

proc sort data=Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
run;

data Table_AB_MAX_Depth_Retain;
 set Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
 if first.name then output;
run;

PROC SQL noprint;
 create table FROM_Table_A as
 select *
 from Table_A
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_A');
 create table FROM_Table_B as
 select *
 from Table_B
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_B');
QUIT;

data Target_Table;
 set FROM_Table_A FROM_Table_B indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),6,12);
run;
/* end of program */

Cheers,

Koen

View solution in original post

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello,

I think this is what you are after.

Please check it with great diligence as I haven't done any quality control. 

data Table_A;                                
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          100       5               6
A          200       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
;
run;

data Table_B;
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          250       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
C          5500      2               1
D          0         11             12
D          10        11             12
D          20        11             12
;
run;

PROC SQL noprint;
 create table Table_A_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_A
 group by Name;
 create table Table_B_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_B
 group by Name;
QUIT;

data Table_AB_MAX_Depth;
 set Table_A_MAX_Depth 
     Table_B_MAX_Depth indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),1,7);
run;

proc sort data=Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
run;

data Table_AB_MAX_Depth_Retain;
 set Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
 if first.name then output;
run;

PROC SQL noprint;
 create table FROM_Table_A as
 select *
 from Table_A
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_A');
 create table FROM_Table_B as
 select *
 from Table_B
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_B');
QUIT;

data Target_Table;
 set FROM_Table_A FROM_Table_B indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),6,12);
run;
/* end of program */

Cheers,

Koen

sbxkoenk
SAS Super FREQ

You can add a PROC SORT to sort on Name.

larsc
Obsidian | Level 7

Hey @sbxkoenk , from what I can tell your solution seems to work perfectly! Thank you for your help!

sbxkoenk
SAS Super FREQ

With pleasure!

Thank you for accepting my answer as a solution.

I have tried to produce code that is very 'readable' (since more compact code would also be possible) but if there's anything (any step) you do not fully understand, please ask! If everything is clear, all the better.

Cheers,

Koen

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1212 views
  • 0 likes
  • 2 in conversation