Quartz | Level 8

## Comparing whether two datasets are identical

I have two large data each is 80x200. The values generated during the experiment are 100 % identical.  These two data  were complied by two different indivduals; a statistician and a chemist. After being compiled I want to,compare if the data are identical in terms of the values. The chemist uses an id of the sample that is long to describe location, study name, patient’s id….., Statistician uses only patient’s number.

The variables in the column are the same and in the same order. Samples in rows  are  not in the same order.

Irrespective of how the smples were named  and in the order I want to verify that the two data are identical.

Can this be done?

For that I have two examples of very small data that I changed their id and the order of the rows

Thank you

id,C1,C2,C3,C4
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836

id,C1,C2,C3,C4
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Comparing whether two datasets are identical

Proc Compare is the basic tool for this. However you want to sort data if at all possible so the same records are compared with the same records.

In your example it is very obvious that none of the ID's will match. 1 is never equal to KIND-001 and depending on your actual data they may not be of the same type.

Example with your data. You can copy this and run in your SAS session to see results.

```data have1;
infile datalines dlm=',';
input id \$ C1 C2 C3 C4;
datalines;
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836
;

data have2;
infile datalines dlm=',';
input id \$ C1 C2 C3 C4;
datalines;
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801
;

proc sort data=have1;
by id;
run;
proc sort data=have2;
by id;
run;

proc compare base=have1 compare=have2;
run;
```

There will details about the data set then the variable descriptions, if type, length or formats are different

Then details about the values of individual variables.

There are MANY options such as if you know the variables have different names to compare some WITH specific others, or just compare some variables and rules concern how close numeric values have to be to report as same or different.

2 REPLIES 2
Super User

## Re: Comparing whether two datasets are identical

Proc Compare is the basic tool for this. However you want to sort data if at all possible so the same records are compared with the same records.

In your example it is very obvious that none of the ID's will match. 1 is never equal to KIND-001 and depending on your actual data they may not be of the same type.

Example with your data. You can copy this and run in your SAS session to see results.

```data have1;
infile datalines dlm=',';
input id \$ C1 C2 C3 C4;
datalines;
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836
;

data have2;
infile datalines dlm=',';
input id \$ C1 C2 C3 C4;
datalines;
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801
;

proc sort data=have1;
by id;
run;
proc sort data=have2;
by id;
run;

proc compare base=have1 compare=have2;
run;
```

There will details about the data set then the variable descriptions, if type, length or formats are different

Then details about the values of individual variables.

There are MANY options such as if you know the variables have different names to compare some WITH specific others, or just compare some variables and rules concern how close numeric values have to be to report as same or different.

Super User

## Re: Comparing whether two datasets are identical

Yes, it can be done.

1. Convert the IDs into variables that will align, ie make them the same type/format and length
2. Sort your data set by ID
3. Use PROC COMPARE but add a fuzz factor so you can tell a true difference from a decimal point difference, use an ID statement so the same IDs are compared between each data set.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n14cxqy1h9hof4n1cq4xmhv2atgs.htm#n14cxqy...

@Job04 wrote:

I have two large data each is 80x200. The values generated during the experiment are 100 % identical.  These two data  were complied by two different indivduals; a statistician and a chemist. After being compiled I want to,compare if the data are identical in terms of the values. The chemist uses an id of the sample that is long to describe location, study name, patient’s id….., Statistician uses only patient’s number.

The variables in the column are the same and in the same order. Samples in rows  are  not in the same order.

Irrespective of how the smples were named  and in the order I want to verify that the two data are identical.

Can this be done?

For that I have two examples of very small data that I changed their id and the order of the rows

Thank you

id,C1,C2,C3,C4
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836

id,C1,C2,C3,C4
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801

Discussion stats
• 2 replies
• 1091 views
• 0 likes
• 3 in conversation