BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Job04
Quartz | Level 8

I have two large data each is 80x200. The values generated during the experiment are 100 % identical.  These two data  were complied by two different indivduals; a statistician and a chemist. After being compiled I want to,compare if the data are identical in terms of the values. The chemist uses an id of the sample that is long to describe location, study name, patient’s id….., Statistician uses only patient’s number.

The variables in the column are the same and in the same order. Samples in rows  are  not in the same order.

Irrespective of how the smples were named  and in the order I want to verify that the two data are identical.

Can this be done?

For that I have two examples of very small data that I changed their id and the order of the rows

Thank you

 

id,C1,C2,C3,C4
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836

 

 

id,C1,C2,C3,C4
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Proc Compare is the basic tool for this. However you want to sort data if at all possible so the same records are compared with the same records.

 

In your example it is very obvious that none of the ID's will match. 1 is never equal to KIND-001 and depending on your actual data they may not be of the same type.

 

Example with your data. You can copy this and run in your SAS session to see results.

 

data have1;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836
;
 

data have2;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801
;

proc sort data=have1;
   by id;
run;
proc sort data=have2;
   by id;
run;

proc compare base=have1 compare=have2;
run;
 

There will details about the data set then the variable descriptions, if type, length or formats are different

 

Then details about the values of individual variables.

There are MANY options such as if you know the variables have different names to compare some WITH specific others, or just compare some variables and rules concern how close numeric values have to be to report as same or different.

View solution in original post

2 REPLIES 2
ballardw
Super User

Proc Compare is the basic tool for this. However you want to sort data if at all possible so the same records are compared with the same records.

 

In your example it is very obvious that none of the ID's will match. 1 is never equal to KIND-001 and depending on your actual data they may not be of the same type.

 

Example with your data. You can copy this and run in your SAS session to see results.

 

data have1;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836
;
 

data have2;
infile datalines dlm=',';
input id $ C1 C2 C3 C4;
datalines;
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801
;

proc sort data=have1;
   by id;
run;
proc sort data=have2;
   by id;
run;

proc compare base=have1 compare=have2;
run;
 

There will details about the data set then the variable descriptions, if type, length or formats are different

 

Then details about the values of individual variables.

There are MANY options such as if you know the variables have different names to compare some WITH specific others, or just compare some variables and rules concern how close numeric values have to be to report as same or different.

Reeza
Super User

Yes, it can be done. 

 

  1. Convert the IDs into variables that will align, ie make them the same type/format and length
  2. Sort your data set by ID
  3. Use PROC COMPARE but add a fuzz factor so you can tell a true difference from a decimal point difference, use an ID statement so the same IDs are compared between each data set. 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n14cxqy1h9hof4n1cq4xmhv2atgs.htm#n14cxqy...

 

 


@Job04 wrote:

I have two large data each is 80x200. The values generated during the experiment are 100 % identical.  These two data  were complied by two different indivduals; a statistician and a chemist. After being compiled I want to,compare if the data are identical in terms of the values. The chemist uses an id of the sample that is long to describe location, study name, patient’s id….., Statistician uses only patient’s number.

The variables in the column are the same and in the same order. Samples in rows  are  not in the same order.

Irrespective of how the smples were named  and in the order I want to verify that the two data are identical.

Can this be done?

For that I have two examples of very small data that I changed their id and the order of the rows

Thank you

 

id,C1,C2,C3,C4
1,4.19855,5.74574,33.46678,6.85391
3,3.48004,6.69138,31.85662,11.73753
4,3.33851,5.74293,36.09064,10.9801
9,3.2966,8.15718,30.27008,7.62836

 

 

id,C1,C2,C3,C4
KIND-009,3.2966,8.15718,30.27008,7.62836
KIND-003,3.48004,6.69138,31.85662,11.73753
KIND-001,4.19855,5.74574,33.46678,6.85391
KIND-004,3.33851,5.74293,36.09064,10.9801

 


 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1037 views
  • 0 likes
  • 3 in conversation