BookmarkSubscribeRSS Feed
Jagadeesh2907
Obsidian | Level 7

Hi 

 

I have 3 datasets each has infomation on reconciliation in the below format. 

 

Dataset 1:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Simon          Matched

 

 

Dataset 2:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Tom             Unmatched

 

Similarily, Dataset3 with the similar reconciliations. I want to create a summary report which tell me percentage of match for each dataset. Something similar to below: 

 

Dataset                   Percentage of match

Dataset1                 100%

Dataset2                  84%

Dataset3                  92%

 

Could you please help me how to solve this and how to approach this. 

 

3 REPLIES 3
r_behata
Barite | Level 11
data ds1;
input ID_source ID_target ID_recon $ name_source $ name_target $ name_recon $;
cards;
1 1 Matched John John Matched
2 2 Matched King King Matched
3 3 Matched Simon Simon Matched
;
run;

data ds2;
input ID_source ID_target ID_recon $ name_source $ name_target $ name_recon $;
cards;
1 1 Matched John John Matched
2 2 Matched King King Unmatched
3 3 Matched Simon Simon Matched
;
run;

%macro getpercent(dsn,n);

	proc freq data=&dsn noprint;
		tables name_recon / missing out=&dsn._per(where=(name_recon="Matched") 
			drop=count  );
	run;

	data &dsn._per;
		Length Dataset $10.;
		format percent_match percent12.2;
		set &dsn._per;
		Dataset="Dataset&n";
		percent_match=percent/100;
		drop name_recon percent;
	run;

	proc append base=final data=&dsn._per force;
	run;

%mend;

proc datasets lib=work nodetails noprint;
	delete final;
run;

%getpercent(ds1,1)
%getpercent(ds2,2)
ballardw
Super User

@Jagadeesh2907 wrote:

Hi 

 

I have 3 datasets each has infomation on reconciliation in the below format. 

 

Dataset 1:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Simon          Matched

 

 

Dataset 2:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Tom             Unmatched

 

Similarily, Dataset3 with the similar reconciliations. I want to create a summary report which tell me percentage of match for each dataset. Something similar to below: 

 

Dataset                   Percentage of match

Dataset1                 100%

Dataset2                  84%

Dataset3                  92%

 

Could you please help me how to solve this and how to approach this. 

 


I would be tempted to start with something like this if no other variables then your Id_recon are involved.

data all (keep=dataset matchnum);
   length ds dataset $41.;
   set dataset1 (keep=id_recon) 
       dataset2 (keep=id_recon) 
       dataset3 (keep=id_recon) 
       indsname=ds
   ;
   dataset= ds;
   matchnum= (id_recon='Matched');
run;
proc tabulate data=all;
   class dataset;
   var matchnum;
   table dataset='',
         matchnum*pctsum=''
         /box='Dataset'
   ;
   label matchnum='Percent of match';
run;

The INDSNAME option on the set statement creates a temporary variable with the name of the contributing data set, the value is assigned to an explicit variable so the value can be used later. I create a numeric variable of 1/0 for matched/other as taking a PCTSUM means I don't have to count matched/nonmatched and calculate the percentage manually.

 

 

You really want to consider using binary coded variables of 1/0 for yes/no true/false match/nomatch types of coding instead of character. You can get a count of responses using the N statistic, SUM gives you the total Yes/true/match, and mean (or pctsum in the procedures that use that statistic) can give you percentages. Plus some modeling procedures such as Logistic regression want the result variable to be numeric.

Jagadeesh2907
Obsidian | Level 7

Hi Thank you. My problem here is i need to find the overall percentage of all matches across my dataset. Here in the example, i have two recon_* columns. hence i would need overall percentage of the matches.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1658 views
  • 0 likes
  • 3 in conversation