How to find common values between two datasets?

Sandeep77 · Posted 04-19-2023 05:28 AM

Hi all,

I have imported a csv file in SAS and have a dataset. I want to compare both of them and find the common postcodes. What is the best way to do it? The imported file is elnz_lowell_12sep and the dataset name is Trace_results. The postcode header is TA_PCODE. I have tried the proc compare step but I don't want the report. I want the table with common variable.

Proc compare base=work.elnz_lowell_12sep
Compare=Trace_result;
var TA_PCODE;
run;

PaigeMiller · Posted 04-19-2023 06:07 AM

Your title says "common values" but then your text asks for "common variable". Which is it? Could you explain further? Can you provide example data and example output?

--
Paige Miller

Sandeep77 · Posted 04-19-2023 07:07 AM

Hi,

As I said that I want to find out from TA_PCODE variable that how many of them are same.

Sample data:

Data elnz_lowell_12sep;
input Reference_number	Add_line_1	Add_line_2	Add_line_	Add_line_4	TA_PCODE;
datalines;
24567456	23 Park Lane	High street	Bradford		BD7 1RE
24468265	81 Moudsley Streed		Leeds		LS12
56452464	49 laistridge lane		Manchester		M14 6PA
;
run;

Data Trace_result;
input Reference_number	Add_line_1	Add_line_2	Add_line_	Add_line_4	TA_PCODE;
datalines;
24567428	43 Roysdale road		London		L24 8PN
54468547	32 Merton Road		Leeds		LS110 1AB
56452464	49 laistridge lane		Manchester		M14 6PA
;
run;

Tom · Posted 04-19-2023 12:27 PM

What exactly do you need to test? You mentioned POSTCODE but your data seems to be based on REFERENCE_NUMBER.

If REFERENCE_NUMBER is a unique key in both datasets then here is a way to split the data into four datasets.

data left right left_only right_only;
  set elnz_lowell_12sep(in=inleft) Trace_result(in=inright);
  by Reference_number ;
  if first.Reference_number	and last.Reference_numbe then do;
     if inleft then output left_only;
     else output right_only;
  end;
  else if inleft then output left;
  else output right;
run;

You can look a the LEFT_ONLY and RIGHT_ONLY datasets to see the reference numbers that show up in only one place.

You can use PROC COMPARE on the left and right datasets to see if any of the values of the other variables differ for the matching observations.

proc compare data=left compare=right;
  id Reference_number;
run;

Kurt_Bremser · Posted 04-19-2023 12:36 PM

@Sandeep77 wrote:

Hi,

As I said that I want to find out from TA_PCODE variable that how many of them are same.

Sample data:

Data elnz_lowell_12sep;
input Reference_number	Add_line_1	Add_line_2	Add_line_	Add_line_4	TA_PCODE;
datalines;
24567456	23 Park Lane	High street	Bradford		BD7 1RE
24468265	81 Moudsley Streed		Leeds		LS12
56452464	49 laistridge lane		Manchester		M14 6PA
;
run;

Data Trace_result;
input Reference_number	Add_line_1	Add_line_2	Add_line_	Add_line_4	TA_PCODE;
datalines;
24567428	43 Roysdale road		London		L24 8PN
54468547	32 Merton Road		Leeds		LS110 1AB
56452464	49 laistridge lane		Manchester		M14 6PA
;
run;

These steps will only result in a lot of ERRORs and unusable datasets with missing numeric values. Please test and fix your codes before posting. It's not rocket science, you CAN do that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 04-19-2023 12:44 PM

If you want to compare the ta_pcode values for given reference_numbers, you need to do this:

proc sort data=elnz_lowell_12sep;
by reference_number;
run;

proc sort data=trace_result;
by reference_number;
run;

proc compare
  base=elnz_lowell_12sep
  compare=trace_result
;
var ta_pcode;
id reference_number;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 04-19-2023 01:06 PM

You really do need to clean up that data input. For simplicity I've dropped everything except the codes of interest.

Data elnz_lowell_12sep;
infile cards dsd truncover;
input Reference_number TA_PCODE $;
cards;
24567456, BD7 1RE
24468265, LS12
56452464, M14 6PA
;
run;

Data Trace_result;
infile cards dsd truncover;
input Reference_number TA_PCODE $;
datalines;
24567428, L24 8PN
54468547, LS110 1AB
56452464, M14 6PA
;
run;

proc sort data=elnz_lowell_12sep(keep=ta_pcode) out=elnz nodupkey;
by ta_pcode;
run;

proc sort data=Trace_result(keep=ta_pcode) out=trace nodupkey;
by ta_pcode;
run;

data want;
merge elnz (in=_e) trace (in=_t);
by ta_pcode;

length status $15.;

if _e & _t then Status='Both';
else if _e then Status='ELNZ Only';
else if _t then Status = 'Trace Only';
 

run;

ballardw · Posted 04-19-2023 05:05 PM

@Sandeep77 wrote:

Hi,

As I said that I want to find out from TA_PCODE variable that how many of them are same.

Do you want to know "how many are the same", a count, or the actual values that are the same, a data set with the codes in both?

This does the latter:

data set1;
  input code $;
datalines;
a
a
a
b
c
d
b
c
c
;

data set2;
  input code $;
datalines;
a
c
q
q
q
c
c
c
;

proc sql;
   create table commoncode as
   select distinct a.code
   from set1 as a 
        natural join
        set2 as b
   ;
quit;

To find codes in Set1 but not Set2

proc sql;
   create table in1_not2 a
   select distinct code from set1
   except
   select distinct code from set2
   ;
quit;

You don't explicitly show what the expected result should look like.

@Sandeep77 wrote:

Hi,

As I said that I want to find out from TA_PCODE variable that how many of them are same.

Sample data:

Data elnz_lowell_12sep;
input Reference_number	Add_line_1	Add_line_2	Add_line_	Add_line_4	TA_PCODE;
datalines;
24567456	23 Park Lane	High street	Bradford		BD7 1RE
24468265	81 Moudsley Streed		Leeds		LS12
56452464	49 laistridge lane		Manchester		M14 6PA
;
run;

Data Trace_result;
input Reference_number	Add_line_1	Add_line_2	Add_line_	Add_line_4	TA_PCODE;
datalines;
24567428	43 Roysdale road		London		L24 8PN
54468547	32 Merton Road		Leeds		LS110 1AB
56452464	49 laistridge lane		Manchester		M14 6PA
;
run;

ballardw · Posted 04-19-2023 12:02 PM

Example input data and expected results.

How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Re: How to find common values between two datasets?

Registration is open