BookmarkSubscribeRSS Feed
Jagadeesh2907
Obsidian | Level 7

Hi 

 

I have 3 datasets each has infomation on reconciliation in the below format. 

 

Dataset 1:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Simon          Matched

 

 

Dataset 2:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Tom             Unmatched

 

Similarily, Dataset3 with the similar reconciliations. I want to create a summary report which tell me percentage of match for each dataset. Something similar to below: 

 

Dataset                   Percentage of match

Dataset1                 100%

Dataset2                  84%

Dataset3                  92%

 

Could you please help me how to solve this and how to approach this. 

 

3 REPLIES 3
r_behata
Barite | Level 11
data ds1;
input ID_source ID_target ID_recon $ name_source $ name_target $ name_recon $;
cards;
1 1 Matched John John Matched
2 2 Matched King King Matched
3 3 Matched Simon Simon Matched
;
run;

data ds2;
input ID_source ID_target ID_recon $ name_source $ name_target $ name_recon $;
cards;
1 1 Matched John John Matched
2 2 Matched King King Unmatched
3 3 Matched Simon Simon Matched
;
run;

%macro getpercent(dsn,n);

	proc freq data=&dsn noprint;
		tables name_recon / missing out=&dsn._per(where=(name_recon="Matched") 
			drop=count  );
	run;

	data &dsn._per;
		Length Dataset $10.;
		format percent_match percent12.2;
		set &dsn._per;
		Dataset="Dataset&n";
		percent_match=percent/100;
		drop name_recon percent;
	run;

	proc append base=final data=&dsn._per force;
	run;

%mend;

proc datasets lib=work nodetails noprint;
	delete final;
run;

%getpercent(ds1,1)
%getpercent(ds2,2)
ballardw
Super User

@Jagadeesh2907 wrote:

Hi 

 

I have 3 datasets each has infomation on reconciliation in the below format. 

 

Dataset 1:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Simon          Matched

 

 

Dataset 2:

ID_source ID_target   ID_recon    name_source name_target name_recon

1                1               Matched     John               John             Matched

2                2               Matched     King                King             Matched

3                3               Matched     Simon             Tom             Unmatched

 

Similarily, Dataset3 with the similar reconciliations. I want to create a summary report which tell me percentage of match for each dataset. Something similar to below: 

 

Dataset                   Percentage of match

Dataset1                 100%

Dataset2                  84%

Dataset3                  92%

 

Could you please help me how to solve this and how to approach this. 

 


I would be tempted to start with something like this if no other variables then your Id_recon are involved.

data all (keep=dataset matchnum);
   length ds dataset $41.;
   set dataset1 (keep=id_recon) 
       dataset2 (keep=id_recon) 
       dataset3 (keep=id_recon) 
       indsname=ds
   ;
   dataset= ds;
   matchnum= (id_recon='Matched');
run;
proc tabulate data=all;
   class dataset;
   var matchnum;
   table dataset='',
         matchnum*pctsum=''
         /box='Dataset'
   ;
   label matchnum='Percent of match';
run;

The INDSNAME option on the set statement creates a temporary variable with the name of the contributing data set, the value is assigned to an explicit variable so the value can be used later. I create a numeric variable of 1/0 for matched/other as taking a PCTSUM means I don't have to count matched/nonmatched and calculate the percentage manually.

 

 

You really want to consider using binary coded variables of 1/0 for yes/no true/false match/nomatch types of coding instead of character. You can get a count of responses using the N statistic, SUM gives you the total Yes/true/match, and mean (or pctsum in the procedures that use that statistic) can give you percentages. Plus some modeling procedures such as Logistic regression want the result variable to be numeric.

Jagadeesh2907
Obsidian | Level 7

Hi Thank you. My problem here is i need to find the overall percentage of all matches across my dataset. Here in the example, i have two recon_* columns. hence i would need overall percentage of the matches.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 952 views
  • 0 likes
  • 3 in conversation