SAS Programming

Job04 · Posted 03-02-2022 06:26 PM

I have two large data each is 80x200. The values generated during the experiment are 100 % identical. These two data were complied by two different indivduals; a statistician and a chemist. After being compiled I want to,compare if the data are identical in terms of the values. The chemist uses an id of the sample that is long to describe location, study name, patient’s id….., Statistician uses only patient’s number.

The variables in the column are the same and in the same order. Samples in rows are not in the same order.

Irrespective of how the smples were named and in the order I want to verify that the two data are identical.

Can this be done?

For that I have two examples of very small data that I changed their id and the order of the rows

Thank you

id,C1,C2,C3,C4
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836

id,C1,C2,C3,C4
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801

ballardw · Posted 03-02-2022 06:40 PM

Proc Compare is the basic tool for this. However you want to sort data if at all possible so the same records are compared with the same records.

In your example it is very obvious that none of the ID's will match. 1 is never equal to KIND-001 and depending on your actual data they may not be of the same type.

Example with your data. You can copy this and run in your SAS session to see results.

data have1;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836
;
 

data have2;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801
;

proc sort data=have1;
   by id;
run;
proc sort data=have2;
   by id;
run;

proc compare base=have1 compare=have2;
run;

There will details about the data set then the variable descriptions, if type, length or formats are different

Then details about the values of individual variables.

There are MANY options such as if you know the variables have different names to compare some WITH specific others, or just compare some variables and rules concern how close numeric values have to be to report as same or different.

View solution in original post

ballardw · Posted 03-02-2022 06:40 PM

Proc Compare is the basic tool for this. However you want to sort data if at all possible so the same records are compared with the same records.

In your example it is very obvious that none of the ID's will match. 1 is never equal to KIND-001 and depending on your actual data they may not be of the same type.

Example with your data. You can copy this and run in your SAS session to see results.

data have1;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836
;
 

data have2;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801
;

proc sort data=have1;
   by id;
run;
proc sort data=have2;
   by id;
run;

proc compare base=have1 compare=have2;
run;

There will details about the data set then the variable descriptions, if type, length or formats are different

Then details about the values of individual variables.

There are MANY options such as if you know the variables have different names to compare some WITH specific others, or just compare some variables and rules concern how close numeric values have to be to report as same or different.

Reeza · Posted 03-02-2022 07:08 PM

Yes, it can be done.

Convert the IDs into variables that will align, ie make them the same type/format and length
Sort your data set by ID
Use PROC COMPARE but add a fuzz factor so you can tell a true difference from a decimal point difference, use an ID statement so the same IDs are compared between each data set.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n14cxqy1h9hof4n1cq4xmhv2atgs.htm#n14cxqy...

@Job04 wrote:

I have two large data each is 80x200. The values generated during the experiment are 100 % identical. These two data were complied by two different indivduals; a statistician and a chemist. After being compiled I want to,compare if the data are identical in terms of the values. The chemist uses an id of the sample that is long to describe location, study name, patient’s id….., Statistician uses only patient’s number.

The variables in the column are the same and in the same order. Samples in rows are not in the same order.

Irrespective of how the smples were named and in the order I want to verify that the two data are identical.

Can this be done?

For that I have two examples of very small data that I changed their id and the order of the rows

Thank you

id,C1,C2,C3,C4
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836

id,C1,C2,C3,C4
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801

SAS Programming

Comparing whether two datasets are identical

Re: Comparing whether two datasets are identical

Re: Comparing whether two datasets are identical

Re: Comparing whether two datasets are identical

Follow Us

What is...

SAS Programming

Comparing whether two datasets are identical

Re: Comparing whether two datasets are identical

Re: Comparing whether two datasets are identical

Re: Comparing whether two datasets are identical

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...