Hi all,
I have imported a csv file in SAS and have a dataset. I want to compare both of them and find the common postcodes. What is the best way to do it? The imported file is elnz_lowell_12sep and the dataset name is Trace_results. The postcode header is TA_PCODE. I have tried the proc compare step but I don't want the report. I want the table with common variable.
Proc compare base=work.elnz_lowell_12sep
Compare=Trace_result;
var TA_PCODE;
run;
Your title says "common values" but then your text asks for "common variable". Which is it? Could you explain further? Can you provide example data and example output?
Hi,
As I said that I want to find out from TA_PCODE variable that how many of them are same.
Sample data:
Data elnz_lowell_12sep;
input Reference_number Add_line_1 Add_line_2 Add_line_ Add_line_4 TA_PCODE;
datalines;
24567456 23 Park Lane High street Bradford BD7 1RE
24468265 81 Moudsley Streed Leeds LS12
56452464 49 laistridge lane Manchester M14 6PA
;
run;
Data Trace_result;
input Reference_number Add_line_1 Add_line_2 Add_line_ Add_line_4 TA_PCODE;
datalines;
24567428 43 Roysdale road London L24 8PN
54468547 32 Merton Road Leeds LS110 1AB
56452464 49 laistridge lane Manchester M14 6PA
;
run;
What exactly do you need to test? You mentioned POSTCODE but your data seems to be based on REFERENCE_NUMBER.
If REFERENCE_NUMBER is a unique key in both datasets then here is a way to split the data into four datasets.
data left right left_only right_only;
set elnz_lowell_12sep(in=inleft) Trace_result(in=inright);
by Reference_number ;
if first.Reference_number and last.Reference_numbe then do;
if inleft then output left_only;
else output right_only;
end;
else if inleft then output left;
else output right;
run;
You can look a the LEFT_ONLY and RIGHT_ONLY datasets to see the reference numbers that show up in only one place.
You can use PROC COMPARE on the left and right datasets to see if any of the values of the other variables differ for the matching observations.
proc compare data=left compare=right;
id Reference_number;
run;
@Sandeep77 wrote:
Hi,
As I said that I want to find out from TA_PCODE variable that how many of them are same.
Sample data:
Data elnz_lowell_12sep; input Reference_number Add_line_1 Add_line_2 Add_line_ Add_line_4 TA_PCODE; datalines; 24567456 23 Park Lane High street Bradford BD7 1RE 24468265 81 Moudsley Streed Leeds LS12 56452464 49 laistridge lane Manchester M14 6PA ; run; Data Trace_result; input Reference_number Add_line_1 Add_line_2 Add_line_ Add_line_4 TA_PCODE; datalines; 24567428 43 Roysdale road London L24 8PN 54468547 32 Merton Road Leeds LS110 1AB 56452464 49 laistridge lane Manchester M14 6PA ; run;
These steps will only result in a lot of ERRORs and unusable datasets with missing numeric values. Please test and fix your codes before posting. It's not rocket science, you CAN do that.
If you want to compare the ta_pcode values for given reference_numbers, you need to do this:
proc sort data=elnz_lowell_12sep;
by reference_number;
run;
proc sort data=trace_result;
by reference_number;
run;
proc compare
base=elnz_lowell_12sep
compare=trace_result
;
var ta_pcode;
id reference_number;
run;
You really do need to clean up that data input. For simplicity I've dropped everything except the codes of interest.
Data elnz_lowell_12sep;
infile cards dsd truncover;
input Reference_number TA_PCODE $;
cards;
24567456, BD7 1RE
24468265, LS12
56452464, M14 6PA
;
run;
Data Trace_result;
infile cards dsd truncover;
input Reference_number TA_PCODE $;
datalines;
24567428, L24 8PN
54468547, LS110 1AB
56452464, M14 6PA
;
run;
proc sort data=elnz_lowell_12sep(keep=ta_pcode) out=elnz nodupkey;
by ta_pcode;
run;
proc sort data=Trace_result(keep=ta_pcode) out=trace nodupkey;
by ta_pcode;
run;
data want;
merge elnz (in=_e) trace (in=_t);
by ta_pcode;
length status $15.;
if _e & _t then Status='Both';
else if _e then Status='ELNZ Only';
else if _t then Status = 'Trace Only';
run;
@Sandeep77 wrote:
Hi,
As I said that I want to find out from TA_PCODE variable that how many of them are same.
Do you want to know "how many are the same", a count, or the actual values that are the same, a data set with the codes in both?
This does the latter:
data set1; input code $; datalines; a a a b c d b c c ; data set2; input code $; datalines; a c q q q c c c ; proc sql; create table commoncode as select distinct a.code from set1 as a natural join set2 as b ; quit;
To find codes in Set1 but not Set2
proc sql; create table in1_not2 a select distinct code from set1 except select distinct code from set2 ; quit;
You don't explicitly show what the expected result should look like.
@Sandeep77 wrote:
Hi,
As I said that I want to find out from TA_PCODE variable that how many of them are same.
Sample data:
Data elnz_lowell_12sep; input Reference_number Add_line_1 Add_line_2 Add_line_ Add_line_4 TA_PCODE; datalines; 24567456 23 Park Lane High street Bradford BD7 1RE 24468265 81 Moudsley Streed Leeds LS12 56452464 49 laistridge lane Manchester M14 6PA ; run; Data Trace_result; input Reference_number Add_line_1 Add_line_2 Add_line_ Add_line_4 TA_PCODE; datalines; 24567428 43 Roysdale road London L24 8PN 54468547 32 Merton Road Leeds LS110 1AB 56452464 49 laistridge lane Manchester M14 6PA ; run;
Example input data and expected results.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.