BookmarkSubscribeRSS Feed
Cruise
Ammonite | Level 13

Actual question:

How to use proc sql's select case/when statements with macro by modifying the code recipe in the reference by Glenn Wright? The code that I am trying to modify here will be found on the page #10 in the reference. I also enclosed my modified code that doesn't produce data "similarity_score".

What am I doing wrong here?

 

Reference: Glenn Wright, Probabilistic Record Linkage, see page #10.

http://www.wuss.org/proceedings11/Papers_Wright_G_76128.pdf

 

Background:

My objective is to conduct proabilistic record linkage to reduce the duplicated entries of patients in hospital discharge data (NY). In the absence of personal variables such as name, address, unique idnetity code, I use gender, date of birth, patient facility identifier, zip and race. The bottom line is that any rows in the data that match to each other in combination of these five variables are the most likely to represent the same person rather different. Vice versa. SAS version: 9.4 (TS1M3).

 

"Sample" data:

data sample;
   input SEX $ DOB PFI ZIP RACE;
datalines;
F	189804	125	10111	1
M	189802	126	12365	2
F	189801	129	12369	1
M	190111	256	14236	2
F	190302	101	12144	1
M	190307	115	12203	2
M	190307	115	12203	2
M	189806	356	18963	1
F	190308	258	14789	2
F	190302	101	12144	1
;
run;

Code modified from Glenn Wright: (please see reference attached)

 

data mydata_clean (index = (obs_num sex dob pfi zip racecat)); set sample;
obs_num = _n_;
run;

%macro blocking_var(myvar);
create table similarity_score(compress=yes) as
select 
 a.obs_num    as obs_num_1,  b.obs_num as obs_num_2, 
 a.sex        as sex_1,      b.sex as sex_2, 
 a.dob        as dob_1,      b.dob as dob_2,
 a.pfi        as pfi_1,      b.pfi as pfi_2,
 a.zip        as zip_1,      b.zip as zip_2,
 a.racecat    as racecat_1,  b.racecat as racecat_2,

select 
case 
when a.sex = b.sex then 1
else -5.64 end as sex_score,

case 
when a.pfi = b.pfi then 7.6
else -4.3 end as pfi_score,

case 
when a.dob = b.dob then 3.5
else -4.2 end as dob_score,

case 
when a.racecat = b.racecat then 1
else -3.3 end as racecat_score

case 
when a.zip = b.zip then 10.6
else -4.3 end as zip_score

calculated sex_score + calculated pfi_score +
calculated dob_score + calculated racecat_score + calculated zip_score as score

from mydata_clean as a INNER JOIN mydata_clean as b
on a.obs_num > b.obs_num
and a.&myvar. = b.&myvar.
where a.&myvar. is not missing
and calculated score=<-10
%mend;

data threshold;
set similarity_scores;
where score =<-12;
run;

Reference:

Glenn Wright's Probabilistic Record Linkage in SAS®, offers excellent solution to  address the problem almost identical to mine. Any hints highly appreciated. 

 

 

19 REPLIES 19
Reeza
Super User

I might be missing something, but you defined the macro but never actually ran it. 

 

There should be a statement somewhere that looks like:

 

Proc sql;
%block_var(BLOCKVAR);
quit;

 

Cruise
Ammonite | Level 13

Thanks Reeza,

Using software is not permitted in this project unfortunately. I tried taking in and out the proc sql and quit with many different variations, but still doesn't work. Wish I had sql experience but not. Please see edited code. What do you think?

 

proc sql;
%macro blocking_var(myvar);
create table similarity_score(compress=yes) as
select
 a.obs_num    as obs_num_1,  b.obs_num as obs_num_2,
 a.sex        as sex_1,      b.sex as sex_2,
 a.dob        as dob_1,      b.dob as dob_2,
 a.pfi        as pfi_1,      b.pfi as pfi_2,
 a.zip        as zip_1,      b.zip as zip_2,
 a.racecat    as racecat_1,  b.racecat as racecat_2,

select
case
when a.sex = b.sex then 1
else -5.64 end as sex_score,

case
when a.pfi = b.pfi then 7.6
else -4.3 end as pfi_score,

case
when a.dob = b.dob then 3.5
else -4.2 end as dob_score,

case
when a.racecat = b.racecat then 1
else -3.3 end as racecat_score

case
when a.zip = b.zip then 10.6
else -4.3 end as zip_score

calculated sex_score + calculated pfi_score +
calculated dob_score + calculated racecat_score + calculated zip_score as score

from mydata_clean as a INNER JOIN mydata_clean as b
on a.obs_num > b.obs_num
and a.&myvar. = b.&myvar.
where a.&myvar. is not missing
and calculated score=<-10
;
quit;
%mend; 

 

Reeza
Super User

Remove the macro components, get it working and then turn it into a macro. As is you're trying to debug too many things. My code was an example of how to use the macro. 

 

If you're unfamiliar with the structure of the code the paper the paper did have full examples. Make sure it's the same format. 

PGStats
Opal | Level 21

Your code has (at least) 2 syntax errors which might go unnoticed when run inside a macro:

 

1) the second select keyword shouldn't be there

2) there should be commas after as zip_score and as racecat_score

 

As @Reeza suggested, get the code to work outside the macro before you run it as a macro. It is much easier to debug that way.

PG
Cruise
Ammonite | Level 13

Hi PG,

I had proc sql worked outside the macro. But it loses its function when i throw it into macro. I run it but nothing happens. No data set created. What am I doing wrong here? I appreciate your time.

 

%macro blocking_var(myvar);
proc sql; 
 create table similarity_scores (compress=yes) as 
 select
 a.obs_num    as obs_num_1,  b.obs_num as obs_num_2, 
 a.sex        as sex_1,      b.sex as sex_2, 
 a.dob        as dob_1,      b.dob as dob_2,
 a.pfi        as pfi_1,      b.pfi as pfi_2,
 a.zip        as zip_1,      b.zip as zip_2,
case when a.sex = b.sex then 2
     else -5 end as sex_score,
case when a.dob = b.dob then 12
     else -6 end as dob_score,
case when a.pfi = b.pfi then 10
     else -5 end as pfi_score,
case when a.zip = b.zip then 10
     else -5 end as zip_score,

calculated sex_score + calculated dob_score  + calculated pfi_score +calculated zip_score 
as total_score 
from
mydata_clean as a INNER JOIN mydata_clean as b  
 on a.obs_num > b.obs_num
 and a.&myvar.=b.&myvar.
 where a.&myvar. is not missing
 and calculated score <=-10  
;
quit;
%mend;
Reeza
Super User

Back to my first comment - where do you call/execute the macro?

Cruise
Ammonite | Level 13
%macro blocking_var(myvar);
proc sql; 
 "union"
quit;
%mend;

Hi Reeza,

 

My last code posted included above macro elements, isn't it?

Reeza
Super User

No. That's a macro definition, you still need to execute the macro.

Macro definition

%macro print_demo(dataset_name);

proc print data=&dataset_name (obs=10);
run;

%mend;

Macro execution:

 

%print_demo(sashelp.cars);
Cruise
Ammonite | Level 13

Like so?

%blocking_var(sex,dob,pfi,zip);
Cruise
Ammonite | Level 13

I get:

 

ERROR: More positional parameters found than defined.

 

when I add:

%blocking_var(sex,dob,pfi,zip); 

I also tried: 

%blocking_var("sex","dob","pfi","zip");

still get the same error as above

Cruise
Ammonite | Level 13

Hi ya'll,

 

Still working on it. I really have to get it working.

Now I have macro definition and executions parts and the error I get is:

 

error.png

 

%macro blocking_var(myvar);
proc sql; 
 create table similarity_scores (compress=yes) as 
 select
a.obs_num    as obs_num_1,  b.obs_num as obs_num_2, 
 a.sex        as sex_1,      b.sex as sex_2, 
 a.dob        as dob_1,      b.dob as dob_2,
a.pfi        as pfi_1,      b.pfi as pfi_2,
a.zip        as zip_1,      b.zip as zip_2,
case when a.sex = b.sex then 2
     else -5 end as sex_score,
case when a.dob = b.dob then 12
     else -6 end as dob_score,
case when a.pfi = b.pfi then 10
     else -5 end as pfi_score,
case when a.zip = b.zip then 10
     else -5 end as zip_score,

calculated sex_score + calculated dob_score  + calculated pfi_score +calculated zip_score 
as total_score 
from
mydata_clean as a INNER JOIN mydata_clean as b  
 on a.obs_num > b.obs_num
and a.&myvar.=b.&myvar.
where a.&myvar. is not missing
and calculated score <=-10  
;
quit;
%mend blocking_var;
%blocking_var(sex); 
%blocking_var(dob);
%blocking_var(pfi);
%blocking_var(zip);

Any hints?

Thank you!!!

 

 

Reeza
Super User

PS. Your first step should be getting the code to work exactly as defined before making changes to it. 

Also, you can take a look at the The Link King which offers a tool to create linkages based on SAS code as well. 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 2825 views
  • 5 likes
  • 4 in conversation