Hi there,
I have 5 datasets (set1 to set5) containing the same variable (eg : var1). Those 5 datasets are 5 different sources.
By example,
set1 contains 5506 var1
set2 contains 6371 var1
set3 contains 56 var1
set4 contains 1237 var1
set5 contains 6371 var1
Of course, there's a lot of duplicates of var1 between those datasets.
(By the way, those totals were made with a PROC SQL and select distinct, so in each set there's no duplicate value.)
Is there a way to produce a summary report that could ventilate those duplicates between the 5 datasets?
Let's say, for example, that in set1 there's 2333 var1 which are also in set2.
I would like to have a table (dataset) like a report which will display as follows:
set1 set2 set3 set4 set5
set1 5506 2333 ...........................
set2 2333 6371 ............................
set3 .............................................
set4 ...............................................
set5 ...................................................
Is this something possible?
EDITED.
data T1 T2 T3 T4 T5;
do I=1 to 1000;
if ranuni(0)>.6 then output T1;
if ranuni(0)>.6 then output T2;
if ranuni(0)>.6 then output T3;
if ranuni(0)>.6 then output T4;
if ranuni(0)>.6 then output T5;
end;
run;
%macro intersect(list= T1 T2 T3 T4 T5 );
%let n=%sysfunc(countw(&list , %str( ) ));
proc sql;
create table temp as
%do i=1 %to &n ;
%let a=%scan(&list,&i,%str( ));
%do j=1 %to &n ;
%let b=%scan(&list,&j,%str( ));
select "&a" as dsn1 length=40,"&b" as dsn2 length=40,count(*) as n from(
select * from &a
intersect
select * from &b
)
%if &i ne &n or &j ne &n %then %do; union %end;
%end;
%end;
;quit;
%mend;
options mprint mlogic symbolgen;
%intersect()
proc transpose data=temp out=want(drop=_:);
by dsn1;
id dsn2;
var n;
run;
How do you show values that are present in 3 tables?
> I was thinking of something like this:
This does not answer the question.
You have a multidimensional crossing. How do you want to show it?
@mkeintz 's solution give you a summarised overview (WANT2 below).
You could also want a detailed overview (WANT1 below).
Or anything in between.
data T1 T2 T3 T4 T5;
do I=1 to 1000;
if ranuni(0)>.6 then output T1;
if ranuni(0)>.6 then output T2;
if ranuni(0)>.6 then output T3;
if ranuni(0)>.6 then output T4;
if ranuni(0)>.6 then output T5;
end;
run;
data WANT1;
merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E);
by I;
src=catx('-',ifc(A,'T1',''),ifc(B,'T2',''),ifc(C,'T3',''),ifc(D,'T4',''),ifc(E,'T5',''));
run;
proc tabulate data=WANT1;
class SRC;
table src=' '*n=' ';
run;
data WANT2 (keep=ROW COL );
length ROW COL $4;
merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E);
by I;
array _in {5} in1-in5;
do C=1 to 5;
COL=cats('SET',C);
if _in{C}=1 then do R=C to 5;
ROW=cats('SET',R);
if _in{r}=1 then output;
end;
end;
run;
proc tabulate data=WANT2 noseps;
class ROW COL;
table ROW=' ' , COL=' '*n=' ';
run;
SET1 | SET2 | SET3 | SET4 | SET5 | |
---|---|---|---|---|---|
SET1 | 405 | . | . | . | . |
SET2 | 163 | 389 | . | . | . |
SET3 | 162 | 151 | 397 | . | . |
SET4 | 151 | 156 | 161 | 396 | . |
SET5 | 158 | 160 | 167 | 178 | 396 |
T1 | T1-T2 | T1-T2-T3 | T1-T2-T3-T4 | T1-T2-T3-T4-T5 | T1-T2-T3-T5 | T1-T2-T4 | T1-T2-T4-T5 | T1-T2-T5 | T1-T3 | T1-T3-T4 | T1-T3-T4-T5 | T1-T3-T5 | T1-T4 | T1-T4-T5 | T1-T5 | T2 | T2-T3 | T2-T3-T4 | T2-T3-T4-T5 | T2-T3-T5 | T2-T4 | T2-T4-T5 | T2-T5 | T3 | T3-T4 | T3-T4-T5 | T3-T5 | T4 | T4-T5 | T5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
57 | 35 | 24 | 8 | 13 | 18 | 29 | 13 | 23 | 45 | 19 | 20 | 15 | 30 | 19 | 37 | 61 | 25 | 23 | 20 | 20 | 24 | 26 | 27 | 54 | 32 | 26 | 35 | 53 | 41 | 43 |
@ChrisNZ wrote:
> I was thinking of something like this:
This does not answer the question.
You have a multidimensional crossing. How do you want to show it?
@mkeintz 's solution give you a summarised overview (WANT2 below).
You could also want a detailed overview (WANT1 below).
Or anything in between.
data T1 T2 T3 T4 T5; do I=1 to 1000; if ranuni(0)>.6 then output T1; if ranuni(0)>.6 then output T2; if ranuni(0)>.6 then output T3; if ranuni(0)>.6 then output T4; if ranuni(0)>.6 then output T5; end; run; data WANT1; merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E); by I; src=catx('-',ifc(A,'T1',''),ifc(B,'T2',''),ifc(C,'T3',''),ifc(D,'T4',''),ifc(E,'T5','')); run; proc tabulate data=WANT1; class SRC; table src=' '*n=' '; run; data WANT2 (keep=ROW COL ); length ROW COL $4; merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E); by I; array _in {5} in1-in5; do C=1 to 5; COL=cats('SET',C); if _in{C}=1 then do R=C to 5; ROW=cats('SET',R); if _in{r}=1 then output; end; end; run; proc tabulate data=WANT2 noseps; class ROW COL; table ROW=' ' , COL=' '*n=' '; run;
SET1 SET2 SET3 SET4 SET5 SET1 405 . . . . SET2 163 389 . . . SET3 162 151 397 . . SET4 151 156 161 396 . SET5 158 160 167 178 396
T1 T1-T2 T1-T2-T3 T1-T2-T3-T4 T1-T2-T3-T4-T5 T1-T2-T3-T5 T1-T2-T4 T1-T2-T4-T5 T1-T2-T5 T1-T3 T1-T3-T4 T1-T3-T4-T5 T1-T3-T5 T1-T4 T1-T4-T5 T1-T5 T2 T2-T3 T2-T3-T4 T2-T3-T4-T5 T2-T3-T5 T2-T4 T2-T4-T5 T2-T5 T3 T3-T4 T3-T4-T5 T3-T5 T4 T4-T5 T5 57 35 24 8 13 18 29 13 23 45 19 20 15 30 19 37 61 25 23 20 20 24 26 27 54 32 26 35 53 41 43
Hi @ChrisNZ
I will try this tomorrow morning. To be honest, I do not understand some lines of the code. However, this is amazing and it's way better that I wanted.
I will deeply look into it to at least understand a small part of it 🙂
EDIT:
Tried WANT1 and it works perfectly, thanks to you. However, for WANT2, it doesn't seem to works. It gives me an empty dataset with the two variables (ROW COL)...
If you were to first merge the 5 datasets, by VAR1, then for each VAR1 value you could write out observations with a collection of ROW and COL values. Let's say that for VAR1=88, it is present in datasets DS1, DS3, and DS5. The you could write out these observations.
ROW COL SET1 SET1 SET3 SET1 SET5 SET1
SET3 SET3 /*Editted addition */
SET5 SET3 /*Editted addition */
SET5 SET5 /*Editted addition */
The you could just do a crosstabulation of ROW * COL:
proc sort data=ds1; by var1; run;
proc sort data=ds2; by var1; run;
proc sort data=ds3; by var1; run;
proc sort data=ds4; by var1; run;
proc sort data=ds5; by var1; run;
data want (keep=row col var1 pattern);
length row col $4;
merge ds1 (in=in1) ds2 (in=in2) ds3 (in=in3) ds4 (in=in4) ds5 (in=in5);
by var1;
array _in {5} in1-in5;
do c=1 to 5;
col=cats('SET',c);
if _in{c}=1 then do r=c to 5;
row=cats('SET',r);
if _in{r}=1 then output;
end;
end;
run;
proc tabulate data=want noseps;
class row col;
table row=' ' , col=' ';
run;
I would try:
data test;
set set1 set2 set3 set4 set5 indsname=dsn;
fromDS = scan(dsn, 2);
run;
proc sort data=test; by var1 fromDS; run;
data temp;
length DS $100;
do until(last.var1);
set test; by var1;
ds = catx("-", ds, fromDS);
end;
keep var1 DS;
run;
proc sql;
create table want as
select DS, count(*) as n
from temp
group by DS;
quit;
(untested)
EDITED.
data T1 T2 T3 T4 T5;
do I=1 to 1000;
if ranuni(0)>.6 then output T1;
if ranuni(0)>.6 then output T2;
if ranuni(0)>.6 then output T3;
if ranuni(0)>.6 then output T4;
if ranuni(0)>.6 then output T5;
end;
run;
%macro intersect(list= T1 T2 T3 T4 T5 );
%let n=%sysfunc(countw(&list , %str( ) ));
proc sql;
create table temp as
%do i=1 %to &n ;
%let a=%scan(&list,&i,%str( ));
%do j=1 %to &n ;
%let b=%scan(&list,&j,%str( ));
select "&a" as dsn1 length=40,"&b" as dsn2 length=40,count(*) as n from(
select * from &a
intersect
select * from &b
)
%if &i ne &n or &j ne &n %then %do; union %end;
%end;
%end;
;quit;
%mend;
options mprint mlogic symbolgen;
%intersect()
proc transpose data=temp out=want(drop=_:);
by dsn1;
id dsn2;
var n;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.