HI Everyone,
I have a dataset AA as follows:
Column1 column2
1 A
2 A
3 A
4 B
5 B
6 C
1 C
8 D
1 D
NOW I have Another dataset BB as follows:
Column1
1
2
3
4
5
6
8
So i need to represent BB as follows:
Column1 column2
1 D
2 D
3 D
4 B
5 B
6 D
8 D
So basically I am looking across for a Common value in different group in table AA.
So in table AA value 1 is common among group A, C, D
So I assign MAX of group(A,C,D) i.e. D in this case to 1.
But if one value in group is updated to new value hence remaining values should also be updated
to the same value in the group.
hence value 2,3,6 are also assigned to D.
Thanks in advance for your help
Regards,
Rohit
Any good solution to the first data set should work on this new data as well. My SAS/OR based code above does, of course .
Hi @Rohit_1990 Nice and interesting question. OK, can you please clarify whether the sample is good representative of your real i.e the data what's been posted is what it is ?
The reason I am asking is my immediate thought went to sort by col1 col2 and the look up. This approach is only possible if you confirm the above otherwise explain what your real may have
Hi @Rohit_1990 I trust your clarifications hold good. See if the following helps. Please fee free to come back to us should you need more help/clarifications
data have;
input Col1 col2 $;
cards;
1 A
2 A
3 A
4 B
5 B
6 C
1 C
8 D
1 D
;
data have1;
input Col1;
cards;
1
2
3
4
5
6
8
;
proc sql;
create table temp as
select *, max(col2) as max_col2
from have
where col1 in (select col1 from have1)
group by col1
order by col1,col2;
quit;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("col2") ;
h.definedata ("max_col2") ;
h.definedone () ;
dcl hiter hh('h');
end;
do until(last.col1);
set temp;
by col1;
if _n_=1 then rc=h.add();
else do;
if h.find()= 0 then f=max_col2;else rc=h.add();
end;
end;
if _n_>1 and not missing(f) then max_col2=f;
drop rc col2 f;
run;
Hi @Rohit_1990
This was an interesting problem. I think it can be solved in many ways, and I tried an SQL approach knowing that it is not considered as elegant as a smart data step, but it seems to work and is easy to explain.
First find highest group per value and use a cartesian join to get all value combinations pr. group. Then join highest group in for all value combinations, and last get the highest group per value.
data aa;
input c1 c2$;
datalines;
1 A
2 A
3 A
4 B
5 B
6 C
1 C
8 D
1 D
;
run;
data bb;
input c1 ;
datalines;
1
2
3
4
5
6
8
;
run;
* highest group per value;
proc sql;
create table max_group_value as
select distinct
c1,
max(c2) as c2
from aa
group by c1;
quit;
* all combinations of values pr. group;
proc sql;
create table group_all_values as
select
a.c1,
b.c1 as c1x,
a.c2
from aa as a, aa as b
where a.c2 = b.c2;
quit;
* highest group per combination;
proc sql;
create table high_group_value as
select
a.c1,
a.c1x,
a.c2,
b.c2 as c2x
from group_all_values as a, max_group_value as b
where group_all_values.c1x = max_group_value.c1;
quit;
* highest group per value;
proc sql;
create table want as
select distinct
c1,
max(c2x) as c2
from high_group_value
group by c1;
quit;
If you have access to SAS/OR then you can frame this problem as finding the connected components in a graph:
data aa;
input col1$ col2$;
datalines;
1 A
2 A
3 A
4 B
5 B
6 C
1 C
8 D
1 D
;
proc optnet links=aa direction=undirected out_nodes=cc;
data_links_var from=col1 to=col2;
concomp;
run;
proc sort data=cc; by concomp descending node; run;
data dd;
do until(last.concomp);
set cc; by concomp;
if first.concomp then col2 = node;
if anydigit(first(node)) then do;
col1 = node;
output;
end;
end;
keep col1 col2;
run;
Hi All,
Thanks a lot !!!!
Well the solution provided by each of you is solving my raised concern.
but I seek one more help from all of you.
in the given dataset below:
Col1 Col2
1 A
2 A
3 B
2 B
4 C
1 C
6 D
4 D
7 D
So here group A is related to group B on common value 2
Group C is related to Group A with common element being 1
likewise Group D with Group C on common value 4.
So as we can see all groups are interrelated with each other based on some common element.
So I need to group all groups in one single group as follows:
Col1 Col2
1 D
2 D
3 D
4 D
6 D
7 D
So now my final dataset has all related groups elements in one single group.
Please help me on this.
Regards and Thanks again
Rohit
Any good solution to the first data set should work on this new data as well. My SAS/OR based code above does, of course .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.