BookmarkSubscribeRSS Feed
elopomorph88
Calcite | Level 5

OK.  I am using Proc Compare but I am just not getting the answer that I need.  Here is my situation:  

 

I have two datasets:  Dataset A and dataset B that are different.  However, they have a column of data that is the same.  There is a site variable that is the same between the two datasets.  However, they have different lists of sites so not all of the sites listed in dataset A are listed in dataset B, and vice versa.  They also have different variable between the two datasets, but they do share a site variable.  We can call the shared site variable "Site X".  Site X is listed as a format of $10 in both datasets.  Site X is an odd text file composed of both letters and text.  An example of Site X is FL0952MB.  I want to match site X in both dataset A and B, but in a certain way.  

 

I want dataset A to stay the same but I want to know if any of the specific Site X sites from dataset B shows up in dataset A.  I know it will for some of the records but I have datasets of over 1,000 records and I don't want to go through them by hand.  So I want to generated a column of data in dataset A that is something like a 0 if that specific Site X is NOT in dataset B and a 1 if that specific site IS listed in dataset B.  I am hoping to get an output like this for Dataset A where the "matched" column means that that specific site X was listed in dataset B:

 

  Site X                   matched

 FL0952MB             1

 RL0989MG            1

 IU0882RF              0

 GR1752PO            0

 YZ5632UY             1

 

Any help would be greatly appreciated.   

4 REPLIES 4
ballardw
Super User

It really helps to provide examples from data from both sets.

Here is one way that includes creating two small data sets that the code can test on.

data dataseta;
   input site $;
datalines;
abc
pdq
x34
;
data datasetb;
   input site $;
datalines;
abc
x34
ggg
;

proc sql;
   create table InA as
   select a.site, ( a.site=b.site) as matched
   from (select distinct site from dataseta ) as a
        left join
        (select distinct site from datasetb) as b
        on a.site=b.site
   ;
quit;

Obviously I don't have your data. If your "site" variable name is different in the two sets replace the name for all the places the variable is used. The A. and B. (the DOT is important) is an SQL alias that allows you to not repeatedly type a longer data set name, tells the procedure which contributing table has the variable, and is pretty much required if your datasets are in permanent libraries as lib.datset.variable will generate syntax errors.

The subquery with the "select distinct" creates sets with only one of each value for comparison. The Left Join says "all the values from the A (or left referenced set) will be included in the output. The ON sets the condition to match things, in this case the equality of the sites. The Select with the (a.site=b.site) as matched uses the property that SAS will use 1 for true/0 for false in results of logic. Where the values don't match, the B set doesn't have the site, the result is False.

elopomorph88
Calcite | Level 5
Thanks to both of you!
PGStats
Opal | Level 21

Quite straitforward with proc SQL:

 

proc sql;
create table datasetC as
select
	*,
	siteX in (select siteX from datasetB) as matched
from datasetA;
quit;
PG
novinosrin
Tourmaline | Level 20

data datasetC  ;
 set dataseta;
 if _n_=1 then do;
 if 0 then set  datasetb;
   dcl hash H (dataset:'datasetb') ;
   h.definekey  ("site") ;
   h.definedone () ;
 end;
 matched=h.check()=0;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 814 views
  • 0 likes
  • 4 in conversation