BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bendsteel6
Obsidian | Level 7

Hi,

I'm trying to combine values from a character column in a structured way.  To explain it, see the picture below:

bendsteel6_2-1666366222941.png

I would like to create rows where the LC variable is concatenated with each of its subsequent values for the header.  In other words, in this case produce records with LC_combos of  A_B, A_C, A_D, A_E, then B_C, B_D, B_E, then C_D, C_E, then finally D_E.  In this case above, there are 10 possible unique combinations of the 5 LCs.  However, in my datasets the number of LCs per header can vary...I could have 5, or 7 or even 10.  I would also like to see which Job the LC's came from.  In the example above in the first output record, the LC 'A' comes from Job 10 and LC 'B' comes from Job 20.

I've been trying to use the LAG function in a do loop but values are coming up empty.  Does anyone know if there's a simple method of accomplishing the above, other than using the LAG function?

 

It doesn't make sense to include my code here because it's all messed and doesn't work.  I'm looking for suggestions.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Yes. Simple self join.

data Have;
  length Header Job LC $10 Line Tot_Lines 8;
  infile datalines dsd dlm="|" truncover;
  input Header -- Tot_lines;
datalines;
x|10|A|1|5
x|20|B|2|5
x|30|C|3|5
x|30|D|4|5
x|50|E|5|5
;

proc sql ;
create table want as
  select a.header
       , a.lc
       , catx('_',a.lc,b.lc) as lc_combo
       , a.job as job1
       , b.job as job2
  from have a 
  inner join have b
  on a.header = b.header
   and a.lc < b.lc
  order by 1,2,3
;
quit;

proc print;
run;

Result:

Obs    Header    LC    lc_combo    job1    job2

  1      x       A       A_B        10      20
  2      x       A       A_C        10      30
  3      x       A       A_D        10      30
  4      x       A       A_E        10      50
  5      x       B       B_C        20      30
  6      x       B       B_D        20      30
  7      x       B       B_E        20      50
  8      x       C       C_D        30      30
  9      x       C       C_E        30      50
 10      x       D       D_E        30      50

PS  The FORMAT statement is for attaching formats to the variable. FORMATS are instructions for how to DISPLAY the variables.   If you want to DEFINE the variables use a LENGTH statement.  Notice also that most variables do not need to have special formats attached them.  SAS knows how to display character variables and numbers already.

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Looks like a simple self join.  Probably easier in PROC SQL.

If you want help writing the code then post the data (not a photograph of the data).  Best is to post it as a data step (one that works!).

bendsteel6
Obsidian | Level 7

I would love help in writing this.  Was trying to stay way from SQL but if that's the best way, then so be it.  Here's the code for the input data.  I'm still trying to work on this but am getting nowhere 😞  .  I'll continue to try.

 

data Have;
format Header Job LC $10. Line Tot_Lines 4.;
infile datalines delimiter="|";
input
Header
Job
LC
Line
Tot_Lines
;
datalines;
x|10|A|1|5
x|20|B|2|5
x|30|C|3|5
x|30|D|4|5
x|50|E|5|5
;
run;

Tom
Super User Tom
Super User

Yes. Simple self join.

data Have;
  length Header Job LC $10 Line Tot_Lines 8;
  infile datalines dsd dlm="|" truncover;
  input Header -- Tot_lines;
datalines;
x|10|A|1|5
x|20|B|2|5
x|30|C|3|5
x|30|D|4|5
x|50|E|5|5
;

proc sql ;
create table want as
  select a.header
       , a.lc
       , catx('_',a.lc,b.lc) as lc_combo
       , a.job as job1
       , b.job as job2
  from have a 
  inner join have b
  on a.header = b.header
   and a.lc < b.lc
  order by 1,2,3
;
quit;

proc print;
run;

Result:

Obs    Header    LC    lc_combo    job1    job2

  1      x       A       A_B        10      20
  2      x       A       A_C        10      30
  3      x       A       A_D        10      30
  4      x       A       A_E        10      50
  5      x       B       B_C        20      30
  6      x       B       B_D        20      30
  7      x       B       B_E        20      50
  8      x       C       C_D        30      30
  9      x       C       C_E        30      50
 10      x       D       D_E        30      50

PS  The FORMAT statement is for attaching formats to the variable. FORMATS are instructions for how to DISPLAY the variables.   If you want to DEFINE the variables use a LENGTH statement.  Notice also that most variables do not need to have special formats attached them.  SAS knows how to display character variables and numbers already.

bendsteel6
Obsidian | Level 7

This is FANTASTIC!  I didn't think it could be that simple, yet complex in concept. Thank you so much!!!

Reeza
Super User

Self + Cross join

 

select a.header, a.lc, catx("_", a.lc, b.lc) as lc_combos, a.job as from _lc_job1, b.job as from _lc_job2
from have as a
cross join have as b
on a.header=b.header  /*join only when header is the same*/
and a.job ne b.job /*do not join with itself*/
 and a.job<b.job /*prevent A_B and B_A*/
order by 1, 2, 3;
quit;

Untested as no data was provided.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 915 views
  • 2 likes
  • 3 in conversation