BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dmckeever
Calcite | Level 5

Hi,

I'm having trouble identifying unique pairs of values in a dataset that has two variables, represented separately, that, taken together, indicate the value of a third variable. In the following example, I created the third variable, Combined, using catx.

Obs     Part1     Part2     Combined

1          ABC     123         ABC:123

2          123      ABC        123:ABC

3          DEF     123         DEF:123

4          123      DEF        123:DEF

5          DEF     ABC       DEF:ABC

6          ABC     DEF       ABC:DEF

The problem is that in practical terms, 'ABC:123' and '123:ABC' are functionally identical (and so on down the list). It's the uniqueness of the pairing, not the order in which the parts appear, that matters. I can't figure out how to represent this. Please note that I'm not trying to do a frequency distribution of the pairings, I'm trying to create a new column with values that are standardized representations of the unique pairings. So 'ABC:123' and '123:ABC' would both generate a value like 'ABC123'. Example of what I'm looking for in the finished product:

Obs     Part1     Part2     Pairing

1          ABC     123          ABC123

2         123       ABC        ABC123

3         DEF     123          DEF123

4         123       DEF        DEF123

5          DEF     ABC        ABCDEF

6          ABC     DEF        ABCDEF

Any help would be greatly appreciated--I've been hacking away at this with various combinations of sort procedures and first.-/last.- logic and have gotten nowhere.

Thanks,

Dan

1 ACCEPTED SOLUTION

Accepted Solutions
aland1
Calcite | Level 5

here's one possibility:

if part1 < part2 then part = cat(part1,part2); else part = cat(part2,part1);

View solution in original post

3 REPLIES 3
aland1
Calcite | Level 5

here's one possibility:

if part1 < part2 then part = cat(part1,part2); else part = cat(part2,part1);

dmckeever
Calcite | Level 5

Thanks! Your solution solved my problem. Didn't realize you could reference "greater than" or "lesser than" relationships that exist across variables, and not just within a variable (I'm fairly new). Cheers, Dan

Ksharp
Super User
data have;
input Obs     Part1  $   Part2 $ ;
cards;
1          ABC     123    
2         123       ABC    
3         DEF     123     
4         123       DEF    
5          DEF     ABC    
6          ABC     DEF
;
run;
data have;
 set have;
 call sortc(Part1,Part2) ;
 x=catx(':',Part1,Part2) ;
run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1989 views
  • 0 likes
  • 3 in conversation