BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dmckeever
Calcite | Level 5

Hi,

I'm having trouble identifying unique pairs of values in a dataset that has two variables, represented separately, that, taken together, indicate the value of a third variable. In the following example, I created the third variable, Combined, using catx.

Obs     Part1     Part2     Combined

1          ABC     123         ABC:123

2          123      ABC        123:ABC

3          DEF     123         DEF:123

4          123      DEF        123:DEF

5          DEF     ABC       DEF:ABC

6          ABC     DEF       ABC:DEF

The problem is that in practical terms, 'ABC:123' and '123:ABC' are functionally identical (and so on down the list). It's the uniqueness of the pairing, not the order in which the parts appear, that matters. I can't figure out how to represent this. Please note that I'm not trying to do a frequency distribution of the pairings, I'm trying to create a new column with values that are standardized representations of the unique pairings. So 'ABC:123' and '123:ABC' would both generate a value like 'ABC123'. Example of what I'm looking for in the finished product:

Obs     Part1     Part2     Pairing

1          ABC     123          ABC123

2         123       ABC        ABC123

3         DEF     123          DEF123

4         123       DEF        DEF123

5          DEF     ABC        ABCDEF

6          ABC     DEF        ABCDEF

Any help would be greatly appreciated--I've been hacking away at this with various combinations of sort procedures and first.-/last.- logic and have gotten nowhere.

Thanks,

Dan

1 ACCEPTED SOLUTION

Accepted Solutions
aland1
Fluorite | Level 6

here's one possibility:

if part1 < part2 then part = cat(part1,part2); else part = cat(part2,part1);

View solution in original post

3 REPLIES 3
aland1
Fluorite | Level 6

here's one possibility:

if part1 < part2 then part = cat(part1,part2); else part = cat(part2,part1);

dmckeever
Calcite | Level 5

Thanks! Your solution solved my problem. Didn't realize you could reference "greater than" or "lesser than" relationships that exist across variables, and not just within a variable (I'm fairly new). Cheers, Dan

Ksharp
Super User
data have;
input Obs     Part1  $   Part2 $ ;
cards;
1          ABC     123    
2         123       ABC    
3         DEF     123     
4         123       DEF    
5          DEF     ABC    
6          ABC     DEF
;
run;
data have;
 set have;
 call sortc(Part1,Part2) ;
 x=catx(':',Part1,Part2) ;
run;

Ksharp

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 3091 views
  • 3 likes
  • 3 in conversation