Hi,
I'm trying to combine values from a character column in a structured way. To explain it, see the picture below:
I would like to create rows where the LC variable is concatenated with each of its subsequent values for the header. In other words, in this case produce records with LC_combos of A_B, A_C, A_D, A_E, then B_C, B_D, B_E, then C_D, C_E, then finally D_E. In this case above, there are 10 possible unique combinations of the 5 LCs. However, in my datasets the number of LCs per header can vary...I could have 5, or 7 or even 10. I would also like to see which Job the LC's came from. In the example above in the first output record, the LC 'A' comes from Job 10 and LC 'B' comes from Job 20.
I've been trying to use the LAG function in a do loop but values are coming up empty. Does anyone know if there's a simple method of accomplishing the above, other than using the LAG function?
It doesn't make sense to include my code here because it's all messed and doesn't work. I'm looking for suggestions.
Thanks!
Yes. Simple self join.
data Have;
length Header Job LC $10 Line Tot_Lines 8;
infile datalines dsd dlm="|" truncover;
input Header -- Tot_lines;
datalines;
x|10|A|1|5
x|20|B|2|5
x|30|C|3|5
x|30|D|4|5
x|50|E|5|5
;
proc sql ;
create table want as
select a.header
, a.lc
, catx('_',a.lc,b.lc) as lc_combo
, a.job as job1
, b.job as job2
from have a
inner join have b
on a.header = b.header
and a.lc < b.lc
order by 1,2,3
;
quit;
proc print;
run;
Result:
Obs Header LC lc_combo job1 job2 1 x A A_B 10 20 2 x A A_C 10 30 3 x A A_D 10 30 4 x A A_E 10 50 5 x B B_C 20 30 6 x B B_D 20 30 7 x B B_E 20 50 8 x C C_D 30 30 9 x C C_E 30 50 10 x D D_E 30 50
PS The FORMAT statement is for attaching formats to the variable. FORMATS are instructions for how to DISPLAY the variables. If you want to DEFINE the variables use a LENGTH statement. Notice also that most variables do not need to have special formats attached them. SAS knows how to display character variables and numbers already.
Looks like a simple self join. Probably easier in PROC SQL.
If you want help writing the code then post the data (not a photograph of the data). Best is to post it as a data step (one that works!).
I would love help in writing this. Was trying to stay way from SQL but if that's the best way, then so be it. Here's the code for the input data. I'm still trying to work on this but am getting nowhere 😞 . I'll continue to try.
data Have;
format Header Job LC $10. Line Tot_Lines 4.;
infile datalines delimiter="|";
input
Header
Job
LC
Line
Tot_Lines
;
datalines;
x|10|A|1|5
x|20|B|2|5
x|30|C|3|5
x|30|D|4|5
x|50|E|5|5
;
run;
Yes. Simple self join.
data Have;
length Header Job LC $10 Line Tot_Lines 8;
infile datalines dsd dlm="|" truncover;
input Header -- Tot_lines;
datalines;
x|10|A|1|5
x|20|B|2|5
x|30|C|3|5
x|30|D|4|5
x|50|E|5|5
;
proc sql ;
create table want as
select a.header
, a.lc
, catx('_',a.lc,b.lc) as lc_combo
, a.job as job1
, b.job as job2
from have a
inner join have b
on a.header = b.header
and a.lc < b.lc
order by 1,2,3
;
quit;
proc print;
run;
Result:
Obs Header LC lc_combo job1 job2 1 x A A_B 10 20 2 x A A_C 10 30 3 x A A_D 10 30 4 x A A_E 10 50 5 x B B_C 20 30 6 x B B_D 20 30 7 x B B_E 20 50 8 x C C_D 30 30 9 x C C_E 30 50 10 x D D_E 30 50
PS The FORMAT statement is for attaching formats to the variable. FORMATS are instructions for how to DISPLAY the variables. If you want to DEFINE the variables use a LENGTH statement. Notice also that most variables do not need to have special formats attached them. SAS knows how to display character variables and numbers already.
This is FANTASTIC! I didn't think it could be that simple, yet complex in concept. Thank you so much!!!
Self + Cross join
select a.header, a.lc, catx("_", a.lc, b.lc) as lc_combos, a.job as from _lc_job1, b.job as from _lc_job2
from have as a
cross join have as b
on a.header=b.header /*join only when header is the same*/
and a.job ne b.job /*do not join with itself*/
and a.job<b.job /*prevent A_B and B_A*/
order by 1, 2, 3;
quit;
Untested as no data was provided.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.