BookmarkSubscribeRSS Feed
cindyforest7
Calcite | Level 5

So what I wanted to do is to generate a table with a, a_match, and b. The idea is to match column a and column a_match based on the same b value but only retain one distinctive matched row when a(i)=a_match(i+n) and a_match(i)=a(i+n) and b(i)=b(i+n).

The the following is what I have for now. The next step is to delete certain rows based on the above criteria

proc sql;

create table one as

select distinct a, b

from original

order by a, b;

quit;

proc sql;

create table two as

select a as a_match, b

from merge1

order by a_match, b;

quit;

proc sql;

create table merge as

select a.a, b.a_match, a.b

from one as a left join two as b on

a.b=b.b

order by a.a, b.a_match, a.b;

quit;

After deleteing rows, what I wanted to do next is to the calcuate the number of distinctive b for each distinctive combination of a and a_match.

Thank you!

5 REPLIES 5
cindyforest7
Calcite | Level 5

Could anyone please help me out? Thank you so much!

Tom
Super User Tom
Super User

Are the value of B unique within the two input tables ORIGINAL and MERGE1?  Or for either of them?

If so then just merge them and rename the A variables so that you can distinguish them from each other.

data want ;

  merge original (rename= (a = original_a) )

            merge1 (rename= (a = match_a))

;

by b;

run;

cindyforest7
Calcite | Level 5

Thank you Tom for the reply.

I already had the merged dataset, and my question is how to retain only one distinctive matched row when a(i)=a_match(i+n) and a_match(i)=a(i+n) and b(i)=b(i+n).

For example, the merged dataset is like this:

a           a_match          b

123       321                  12345678

123       345                   9876450

134       345                   90485769

321       123                  12345678

345       123                   9876450

345       134                   90485769

Notice that row 1 and 4, 2 and 5, 3 and 6 have the same set of a and a_match as well as same b value. So I wanted to retain only one row with a unique combination of a, a_match, and b. Actually a_match is just a copy of a with a new name, the reason I did this is that I wanted to see which two a's share the same b. And I wanted to know how many b's each two a's share.

Tom
Super User Tom
Super User

One way to detect permutations is to order the values across the variables.  Then any normal type of duplication check will work.

You might do something like:

proc sql ;

  create table merge2 as

    select b

         , min(a,a_match) as var1

         , max(a,a_match) as var2

         , a

         , a_match

    from merge

  ;

quit;

proc sort data=merge2 nodupkey ;

  by b var1 var2 ;

run;

cindyforest7
Calcite | Level 5

Brilliant!!

Thank you so much!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 854 views
  • 0 likes
  • 2 in conversation