BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jpprovost
Quartz | Level 8

Hi there,

 

I have 5 datasets (set1 to set5) containing the same variable (eg : var1). Those 5 datasets are 5 different sources.

 

By example,

set1 contains 5506 var1

set2 contains 6371 var1

set3 contains 56 var1

set4 contains 1237 var1

set5 contains 6371 var1

 

Of course, there's a lot of duplicates of var1 between those datasets.

(By the way, those totals were made with a PROC SQL and select distinct, so in each set there's no duplicate value.)

 

Is there a way to produce a summary report that could ventilate those duplicates between the 5 datasets?

 

Let's say, for example, that in set1 there's 2333 var1 which are also in set2.

I would like to have a table (dataset) like a report which will display as follows:

 

          set1    set2   set3   set4   set5

set1   5506   2333 ...........................

set2   2333   6371 ............................

set3    .............................................

set4   ...............................................

set5 ...................................................

 

Is this something possible?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

EDITED.

 

data T1 T2 T3 T4 T5;
  do I=1 to 1000;
    if ranuni(0)>.6 then output T1;   
    if ranuni(0)>.6 then output T2;   
    if ranuni(0)>.6 then output T3;   
    if ranuni(0)>.6 then output T4;   
    if ranuni(0)>.6 then output T5;
  end;
run;

%macro intersect(list= T1 T2 T3 T4 T5  );
%let n=%sysfunc(countw(&list , %str( ) ));

proc sql;
create table temp as

%do i=1 %to &n ;
%let a=%scan(&list,&i,%str( ));
%do j=1 %to &n ;
%let b=%scan(&list,&j,%str( ));

select "&a" as dsn1 length=40,"&b" as dsn2 length=40,count(*) as n from(
select * from &a 
intersect
select * from &b 
)

%if &i ne &n or &j ne &n %then %do; union %end;

%end;
%end;

;quit;

%mend;

options mprint mlogic symbolgen;
%intersect()

proc transpose data=temp out=want(drop=_:);
by dsn1;
id dsn2;
var n;
run;

View solution in original post

9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

How do you show values that are present in 3 tables?

jpprovost
Quartz | Level 8

@ChrisNZ wrote:

How do you show values that are present in 3 tables?


Hi @ChrisNZ 
I was thinking of something like this:

data test ABC;
set set1 (IN=A)
set2 (IN=B)
set3 (IN=C);
IF A AND B AND C THEN OUTPUT ABC;
run;

But I am not sure...

ChrisNZ
Tourmaline | Level 20

> I was thinking of something like this:

This does not answer the question.

You have a multidimensional crossing. How do you want to show it?

@mkeintz 's solution give you a summarised overview (WANT2 below).

You could also want a detailed overview (WANT1 below).

Or anything in between.

 

data T1 T2 T3 T4 T5;
  do I=1 to 1000;
    if ranuni(0)>.6 then output T1;   
    if ranuni(0)>.6 then output T2;   
    if ranuni(0)>.6 then output T3;   
    if ranuni(0)>.6 then output T4;   
    if ranuni(0)>.6 then output T5;
  end;
run;

data WANT1;
  merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E);
  by I;
  src=catx('-',ifc(A,'T1',''),ifc(B,'T2',''),ifc(C,'T3',''),ifc(D,'T4',''),ifc(E,'T5',''));
run;

proc tabulate data=WANT1;
  class SRC;
  table src=' '*n=' '; 
run;

data WANT2 (keep=ROW COL );
  length ROW COL $4;
  merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E);
  by I;
  array _in {5} in1-in5;
  do C=1 to 5;
    COL=cats('SET',C);
    if _in{C}=1 then do R=C to 5;
      ROW=cats('SET',R);
      if _in{r}=1 then output;
    end;
  end;
run;

proc tabulate data=WANT2  noseps;
  class ROW  COL;
  table ROW=' ' , COL=' '*n=' ';
run;
  SET1 SET2 SET3 SET4 SET5
SET1 405 . . . .
SET2 163 389 . . .
SET3 162 151 397 . .
SET4 151 156 161 396 .
SET5 158 160 167 178 396

T1 T1-T2 T1-T2-T3 T1-T2-T3-T4 T1-T2-T3-T4-T5 T1-T2-T3-T5 T1-T2-T4 T1-T2-T4-T5 T1-T2-T5 T1-T3 T1-T3-T4 T1-T3-T4-T5 T1-T3-T5 T1-T4 T1-T4-T5 T1-T5 T2 T2-T3 T2-T3-T4 T2-T3-T4-T5 T2-T3-T5 T2-T4 T2-T4-T5 T2-T5 T3 T3-T4 T3-T4-T5 T3-T5 T4 T4-T5 T5
57 35 24 8 13 18 29 13 23 45 19 20 15 30 19 37 61 25 23 20 20 24 26 27 54 32 26 35 53 41 43

 


 

 

 

 

jpprovost
Quartz | Level 8

@ChrisNZ wrote:

> I was thinking of something like this:

This does not answer the question.

You have a multidimensional crossing. How do you want to show it?

@mkeintz 's solution give you a summarised overview (WANT2 below).

You could also want a detailed overview (WANT1 below).

Or anything in between.

 

data T1 T2 T3 T4 T5;
  do I=1 to 1000;
    if ranuni(0)>.6 then output T1;   
    if ranuni(0)>.6 then output T2;   
    if ranuni(0)>.6 then output T3;   
    if ranuni(0)>.6 then output T4;   
    if ranuni(0)>.6 then output T5;
  end;
run;

data WANT1;
  merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E);
  by I;
  src=catx('-',ifc(A,'T1',''),ifc(B,'T2',''),ifc(C,'T3',''),ifc(D,'T4',''),ifc(E,'T5',''));
run;

proc tabulate data=WANT1;
  class SRC;
  table src=' '*n=' '; 
run;

data WANT2 (keep=ROW COL );
  length ROW COL $4;
  merge T1(in=A) T2(in=B) T3(in=C) T4(in=D) T5(in=E);
  by I;
  array _in {5} in1-in5;
  do C=1 to 5;
    COL=cats('SET',C);
    if _in{C}=1 then do R=C to 5;
      ROW=cats('SET',R);
      if _in{r}=1 then output;
    end;
  end;
run;

proc tabulate data=WANT2  noseps;
  class ROW  COL;
  table ROW=' ' , COL=' '*n=' ';
run;
  SET1 SET2 SET3 SET4 SET5
SET1 405 . . . .
SET2 163 389 . . .
SET3 162 151 397 . .
SET4 151 156 161 396 .
SET5 158 160 167 178 396

T1 T1-T2 T1-T2-T3 T1-T2-T3-T4 T1-T2-T3-T4-T5 T1-T2-T3-T5 T1-T2-T4 T1-T2-T4-T5 T1-T2-T5 T1-T3 T1-T3-T4 T1-T3-T4-T5 T1-T3-T5 T1-T4 T1-T4-T5 T1-T5 T2 T2-T3 T2-T3-T4 T2-T3-T4-T5 T2-T3-T5 T2-T4 T2-T4-T5 T2-T5 T3 T3-T4 T3-T4-T5 T3-T5 T4 T4-T5 T5
57 35 24 8 13 18 29 13 23 45 19 20 15 30 19 37 61 25 23 20 20 24 26 27 54 32 26 35 53 41 43

 


 

 

 

 


Hi @ChrisNZ 

I will try this tomorrow morning. To be honest, I do not understand some lines of the code. However, this is amazing and it's way better that I wanted.

I will deeply look into it to at least understand a small part of it 🙂

 

EDIT:

@ChrisNZ 

Tried WANT1 and it works perfectly, thanks to you. However, for WANT2, it doesn't seem to works. It gives me an empty dataset with the two variables (ROW COL)...

 

mkeintz
PROC Star

If you were to first merge the 5 datasets, by VAR1, then for each VAR1 value you could write out observations with a collection of ROW and COL values.  Let's say that for VAR1=88,  it is present in datasets DS1, DS3, and DS5.  The you could write out these observations.

 

ROW    COL
SET1   SET1
SET3   SET1
SET5   SET1
SET3 SET3    /*Editted addition */
SET5 SET3    /*Editted addition */
SET5 SET5    /*Editted addition */

The you could just do a crosstabulation of ROW * COL:

 


proc sort data=ds1; by var1; run;
proc sort data=ds2; by var1; run;
proc sort data=ds3; by var1; run;
proc sort data=ds4; by var1; run;
proc sort data=ds5; by var1; run;

data want (keep=row col var1 pattern);
  length row col $4;
  merge ds1 (in=in1) ds2 (in=in2) ds3 (in=in3) ds4 (in=in4) ds5 (in=in5);
  by var1;

  array _in {5} in1-in5;
  do c=1 to 5;
    col=cats('SET',c);
    if _in{c}=1 then do r=c to 5;
      row=cats('SET',r);
      if _in{r}=1 then output;
    end;
  end;
run;

proc tabulate data=want  noseps;
  class row  col;
  table row=' ' , col=' ';
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jpprovost
Quartz | Level 8
@mkeintz
Thanks also for your help. Will try it tomorrow.
PGStats
Opal | Level 21

I would try:

 

data test;
set set1 set2 set3 set4 set5 indsname=dsn;
fromDS = scan(dsn, 2);
run;

proc sort data=test; by var1 fromDS; run;

data temp;
length DS $100;
do until(last.var1);
	set test; by var1;
	ds = catx("-", ds, fromDS); 
	end;
keep var1 DS;
run;

proc sql;
create table want as
select DS, count(*) as n
from temp
group by DS;
quit;

(untested)

PG
jpprovost
Quartz | Level 8
@PGStats
Thank you also for your time and help. I will try your piece of code also tomorrow.
🙂
Ksharp
Super User

EDITED.

 

data T1 T2 T3 T4 T5;
  do I=1 to 1000;
    if ranuni(0)>.6 then output T1;   
    if ranuni(0)>.6 then output T2;   
    if ranuni(0)>.6 then output T3;   
    if ranuni(0)>.6 then output T4;   
    if ranuni(0)>.6 then output T5;
  end;
run;

%macro intersect(list= T1 T2 T3 T4 T5  );
%let n=%sysfunc(countw(&list , %str( ) ));

proc sql;
create table temp as

%do i=1 %to &n ;
%let a=%scan(&list,&i,%str( ));
%do j=1 %to &n ;
%let b=%scan(&list,&j,%str( ));

select "&a" as dsn1 length=40,"&b" as dsn2 length=40,count(*) as n from(
select * from &a 
intersect
select * from &b 
)

%if &i ne &n or &j ne &n %then %do; union %end;

%end;
%end;

;quit;

%mend;

options mprint mlogic symbolgen;
%intersect()

proc transpose data=temp out=want(drop=_:);
by dsn1;
id dsn2;
var n;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1351 views
  • 4 likes
  • 5 in conversation