Actual question:
How to use proc sql's select case/when statements with macro by modifying the code recipe in the reference by Glenn Wright? The code that I am trying to modify here will be found on the page #10 in the reference. I also enclosed my modified code that doesn't produce data "similarity_score".
What am I doing wrong here?
Reference: Glenn Wright, Probabilistic Record Linkage, see page #10.
http://www.wuss.org/proceedings11/Papers_Wright_G_76128.pdf
Background:
My objective is to conduct proabilistic record linkage to reduce the duplicated entries of patients in hospital discharge data (NY). In the absence of personal variables such as name, address, unique idnetity code, I use gender, date of birth, patient facility identifier, zip and race. The bottom line is that any rows in the data that match to each other in combination of these five variables are the most likely to represent the same person rather different. Vice versa. SAS version: 9.4 (TS1M3).
"Sample" data:
data sample;
input SEX $ DOB PFI ZIP RACE;
datalines;
F 189804 125 10111 1
M 189802 126 12365 2
F 189801 129 12369 1
M 190111 256 14236 2
F 190302 101 12144 1
M 190307 115 12203 2
M 190307 115 12203 2
M 189806 356 18963 1
F 190308 258 14789 2
F 190302 101 12144 1
;
run;
Code modified from Glenn Wright: (please see reference attached)
data mydata_clean (index = (obs_num sex dob pfi zip racecat)); set sample;
obs_num = _n_;
run;
%macro blocking_var(myvar);
create table similarity_score(compress=yes) as
select
a.obs_num as obs_num_1, b.obs_num as obs_num_2,
a.sex as sex_1, b.sex as sex_2,
a.dob as dob_1, b.dob as dob_2,
a.pfi as pfi_1, b.pfi as pfi_2,
a.zip as zip_1, b.zip as zip_2,
a.racecat as racecat_1, b.racecat as racecat_2,
select
case
when a.sex = b.sex then 1
else -5.64 end as sex_score,
case
when a.pfi = b.pfi then 7.6
else -4.3 end as pfi_score,
case
when a.dob = b.dob then 3.5
else -4.2 end as dob_score,
case
when a.racecat = b.racecat then 1
else -3.3 end as racecat_score
case
when a.zip = b.zip then 10.6
else -4.3 end as zip_score
calculated sex_score + calculated pfi_score +
calculated dob_score + calculated racecat_score + calculated zip_score as score
from mydata_clean as a INNER JOIN mydata_clean as b
on a.obs_num > b.obs_num
and a.&myvar. = b.&myvar.
where a.&myvar. is not missing
and calculated score=<-10
%mend;
data threshold;
set similarity_scores;
where score =<-12;
run;
Reference:
Glenn Wright's Probabilistic Record Linkage in SAS®, offers excellent solution to address the problem almost identical to mine. Any hints highly appreciated.
I might be missing something, but you defined the macro but never actually ran it.
There should be a statement somewhere that looks like:
Proc sql;
%block_var(BLOCKVAR);
quit;
Thanks Reeza,
Using software is not permitted in this project unfortunately. I tried taking in and out the proc sql and quit with many different variations, but still doesn't work. Wish I had sql experience but not. Please see edited code. What do you think?
proc sql;
%macro blocking_var(myvar);
create table similarity_score(compress=yes) as
select
a.obs_num as obs_num_1, b.obs_num as obs_num_2,
a.sex as sex_1, b.sex as sex_2,
a.dob as dob_1, b.dob as dob_2,
a.pfi as pfi_1, b.pfi as pfi_2,
a.zip as zip_1, b.zip as zip_2,
a.racecat as racecat_1, b.racecat as racecat_2,
select
case
when a.sex = b.sex then 1
else -5.64 end as sex_score,
case
when a.pfi = b.pfi then 7.6
else -4.3 end as pfi_score,
case
when a.dob = b.dob then 3.5
else -4.2 end as dob_score,
case
when a.racecat = b.racecat then 1
else -3.3 end as racecat_score
case
when a.zip = b.zip then 10.6
else -4.3 end as zip_score
calculated sex_score + calculated pfi_score +
calculated dob_score + calculated racecat_score + calculated zip_score as score
from mydata_clean as a INNER JOIN mydata_clean as b
on a.obs_num > b.obs_num
and a.&myvar. = b.&myvar.
where a.&myvar. is not missing
and calculated score=<-10
;
quit;
%mend;
Remove the macro components, get it working and then turn it into a macro. As is you're trying to debug too many things. My code was an example of how to use the macro.
If you're unfamiliar with the structure of the code the paper the paper did have full examples. Make sure it's the same format.
Your code has (at least) 2 syntax errors which might go unnoticed when run inside a macro:
1) the second select keyword shouldn't be there
2) there should be commas after as zip_score and as racecat_score
As @Reeza suggested, get the code to work outside the macro before you run it as a macro. It is much easier to debug that way.
Hi PG,
I had proc sql worked outside the macro. But it loses its function when i throw it into macro. I run it but nothing happens. No data set created. What am I doing wrong here? I appreciate your time.
%macro blocking_var(myvar);
proc sql;
create table similarity_scores (compress=yes) as
select
a.obs_num as obs_num_1, b.obs_num as obs_num_2,
a.sex as sex_1, b.sex as sex_2,
a.dob as dob_1, b.dob as dob_2,
a.pfi as pfi_1, b.pfi as pfi_2,
a.zip as zip_1, b.zip as zip_2,
case when a.sex = b.sex then 2
else -5 end as sex_score,
case when a.dob = b.dob then 12
else -6 end as dob_score,
case when a.pfi = b.pfi then 10
else -5 end as pfi_score,
case when a.zip = b.zip then 10
else -5 end as zip_score,
calculated sex_score + calculated dob_score + calculated pfi_score +calculated zip_score
as total_score
from
mydata_clean as a INNER JOIN mydata_clean as b
on a.obs_num > b.obs_num
and a.&myvar.=b.&myvar.
where a.&myvar. is not missing
and calculated score <=-10
;
quit;
%mend;
Back to my first comment - where do you call/execute the macro?
%macro blocking_var(myvar);
proc sql;
"union"
quit;
%mend;
Hi Reeza,
My last code posted included above macro elements, isn't it?
No. That's a macro definition, you still need to execute the macro.
Macro definition
%macro print_demo(dataset_name);
proc print data=&dataset_name (obs=10);
run;
%mend;
Macro execution:
%print_demo(sashelp.cars);
Like so?
%blocking_var(sex,dob,pfi,zip);
I get:
ERROR: More positional parameters found than defined.
when I add:
%blocking_var(sex,dob,pfi,zip);
I also tried:
%blocking_var("sex","dob","pfi","zip");
still get the same error as above
Hi ya'll,
Still working on it. I really have to get it working.
Now I have macro definition and executions parts and the error I get is:
%macro blocking_var(myvar);
proc sql;
create table similarity_scores (compress=yes) as
select
a.obs_num as obs_num_1, b.obs_num as obs_num_2,
a.sex as sex_1, b.sex as sex_2,
a.dob as dob_1, b.dob as dob_2,
a.pfi as pfi_1, b.pfi as pfi_2,
a.zip as zip_1, b.zip as zip_2,
case when a.sex = b.sex then 2
else -5 end as sex_score,
case when a.dob = b.dob then 12
else -6 end as dob_score,
case when a.pfi = b.pfi then 10
else -5 end as pfi_score,
case when a.zip = b.zip then 10
else -5 end as zip_score,
calculated sex_score + calculated dob_score + calculated pfi_score +calculated zip_score
as total_score
from
mydata_clean as a INNER JOIN mydata_clean as b
on a.obs_num > b.obs_num
and a.&myvar.=b.&myvar.
where a.&myvar. is not missing
and calculated score <=-10
;
quit;
%mend blocking_var;
%blocking_var(sex);
%blocking_var(dob);
%blocking_var(pfi);
%blocking_var(zip);
Any hints?
Thank you!!!
PS. Your first step should be getting the code to work exactly as defined before making changes to it.
Also, you can take a look at the The Link King which offers a tool to create linkages based on SAS code as well.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.