BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dmckeever
Calcite | Level 5

Hi,

I'm having trouble identifying unique pairs of values in a dataset that has two variables, represented separately, that, taken together, indicate the value of a third variable. In the following example, I created the third variable, Combined, using catx.

Obs     Part1     Part2     Combined

1          ABC     123         ABC:123

2          123      ABC        123:ABC

3          DEF     123         DEF:123

4          123      DEF        123:DEF

5          DEF     ABC       DEF:ABC

6          ABC     DEF       ABC:DEF

The problem is that in practical terms, 'ABC:123' and '123:ABC' are functionally identical (and so on down the list). It's the uniqueness of the pairing, not the order in which the parts appear, that matters. I can't figure out how to represent this. Please note that I'm not trying to do a frequency distribution of the pairings, I'm trying to create a new column with values that are standardized representations of the unique pairings. So 'ABC:123' and '123:ABC' would both generate a value like 'ABC123'. Example of what I'm looking for in the finished product:

Obs     Part1     Part2     Pairing

1          ABC     123          ABC123

2         123       ABC        ABC123

3         DEF     123          DEF123

4         123       DEF        DEF123

5          DEF     ABC        ABCDEF

6          ABC     DEF        ABCDEF

Any help would be greatly appreciated--I've been hacking away at this with various combinations of sort procedures and first.-/last.- logic and have gotten nowhere.

Thanks,

Dan

1 ACCEPTED SOLUTION

Accepted Solutions
aland1
Fluorite | Level 6

here's one possibility:

if part1 < part2 then part = cat(part1,part2); else part = cat(part2,part1);

View solution in original post

3 REPLIES 3
aland1
Fluorite | Level 6

here's one possibility:

if part1 < part2 then part = cat(part1,part2); else part = cat(part2,part1);

dmckeever
Calcite | Level 5

Thanks! Your solution solved my problem. Didn't realize you could reference "greater than" or "lesser than" relationships that exist across variables, and not just within a variable (I'm fairly new). Cheers, Dan

Ksharp
Super User
data have;
input Obs     Part1  $   Part2 $ ;
cards;
1          ABC     123    
2         123       ABC    
3         DEF     123     
4         123       DEF    
5          DEF     ABC    
6          ABC     DEF
;
run;
data have;
 set have;
 call sortc(Part1,Part2) ;
 x=catx(':',Part1,Part2) ;
run;

Ksharp

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 3072 views
  • 3 likes
  • 3 in conversation