Hello,
I am using the following SAS code to search for premium datasets from a policies list. So if I have a policy number let's say 1111-1111, it search for this policy into all datasets available.
However, in the on condition, we have
on(a.agreement_nbr=b.agreement_nbr_dr and a.&varname1.=b.province and a.company_cd=b.legacy_underwriting_company);
In some old premium datasets, a.&varname1 eq 'PQ' and in b.province = 'QC', so we are loosing 4772 policies number from the original list. Is there a way to say to remap it's value to QC when it is PQ in the on statement
Here's the complete code
%macro searchpremium /minoperator;
Data dest1.found_agreements;
attrib
/*********************************** Agreement Number location information ***************************************************/
KEY label="Record to be anonymized key" length=$31. format=$char31. informat=$char31.
ORGNL_SAS_DS_ROW_NBR label="Original SAS dataset Row number" length=8 format=8. informat=8.
agreement_nbr label="Agreement Number" length=$20. format=$char20. informat=$char20.
cie label="company cd " length=$2. format=$char2. informat=$char2.
lob label="Line Of Business" length=$4. format=$char4. informat=$char4.
year label="Year" length=$4. format=$char4. informat=$char4.
month label="Month" length=$3. format=$char3. informat=$char3.
libname label="path" length=$100. format=$char100. informat=$char100.
filename label="Dataset Name" length=$25. format=$char25. informat=$char25.
company_cd label="Company CD" length=$1. format=$char1. informat=$char1.
policy_expiry_date label="Policy Expiry Date" length=8 format=yymmdd10. informat=yymmdd10.
/*********************************** Data Retention dataset information ***************************************************/
/* Modification: appending policy_number and policy_number_src then renaming agreement_nbr and removing duplicate
POLICY_NUMBER label="Policy Number" length=$25. format=$char25. informat=$char25.
POLICY_NUMBER_SRC label="Policy Number Source" length=$25. format=$char25. informat=$char25.
*/
AGREEMENT_NBR_DR label="Agreement Number Data Retention" length=$25. format=$char25. informat=$char25.
LINE_OF_BUSINESS label="Line of Business" length=$3. format=$char3. informat=$char3.
PROVINCE label="Province" length=$10. format=$char10. informat=$char10.
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10. format=$char10. informat=$char10.
UNDERWRITING_COMPANY label="Underwriting Company" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10. format=$char10. informat=$char10.
SOURCE_SYSTEM label="Source System" length=$15. format=$char15. informat=$char15.
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10. format=$char10. informat=$char10.
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36. format=$char36. informat=$char36.
;
stop;
run;
%do h=1 %to %sysfunc(countw(&cielist.));
%if %tslit(%scan(&cielist.,&h)) eq 'je' %then %let value=2;
%else %if %tslit(%scan(&cielist.,&h)) eq 'gc' %then %let value=0;
%else %let value=1;
%put &=value;
%do i=1 %to %sysfunc(countw(&loblist.))-&value.;
%do k=1 %to %sysfunc(countw(&yearlist.));
%do j=1 %to %sysfunc(countw(&monthlist.));
%let fname=%scan(&cielist.,&h)_%scan(&loblist.,&i)_prm%scan(&monthlist.,&j)%scan(&yearlist.,&k);
%let libnm=/dwh_actuariat/sasdata/Data_Retention/sas%scan(&yearlist.,&k)/%scan(&cielist.,&h)/%scan(&subfolder.,&i)/;
%let cie=%scan(&cielist.,&h);
%let does_it_exist=%sysfunc(fileexist(&libnm.));
%put &=does_it_exist &=libnm.;
%if &does_it_exist eq 1 %then
%do;
%put &=libnm &=fname.;
libname src1 spde "&libnm.";
%put &libnm.&fname.;
%if %sysfunc(exist(src1.&fname.)) eq 1 %then
%do;
/****************** checking if the variable exist ****************/
%if %varexist(src1.&fname.,province_cd) %then %let varname1=province_cd;
%else %if %varexist(src1.&fname.,RISK_PROVINCE_CD) %then %let varname1=RISK_PROVINCE_CD;
%else %if %varexist(src1.&fname.,POLICY_PROVINCE_CD) %then %let varname1=POLICY_PROVINCE_CD;
proc sql;
create table table_info as
select distinct strip(agreement_nbr) as agreement_nbr length=20,
a.ORGNL_SAS_DS_ROW_NBR,
"%scan(&cielist.,&h)" as cie length=2,
"%scan(&loblist.,&i)" as lob length=4,
"%scan(&yearlist.,&k)" as year length=4,
"%scan(&monthlist.,&j)" as month length=3,
"&libnm" as libname length=100,
"&fname" as filename length=25,
a.company_cd,
catx('|',"%scan(&cielist.,&h)","%scan(&loblist.,&i)","%scan(&yearlist.,&k)","%scan(&monthlist.,&j)",a.ORGNL_SAS_DS_ROW_NBR) as key length=31,
input(a.policy_expiry_dt,YYMMDD10.) as policy_expiry_date length=8,
b.*
from src1.&fname. as a
inner join lookup as b
on(a.agreement_nbr=b.agreement_nbr_dr and a.&varname1.=b.province and a.company_cd=b.legacy_underwriting_company);
quit;
%if &sqlobs ne 0 %then
%do;
proc append base=dest1.found_agreements data=table_info force;
run;
%end;
proc sql;
drop table table_info;
quit;
%end;
%else
%do;
%put "the file &fname. does not exist";
%end;
%end;
%end;
%end
%end;
%end;
%end;
proc sort data=dest1.found_agreements out=dest1.found_agreements_in_premds;
by key ;
run;
data listofdatasetstocleanup (keep= libname filename);
set dest1.found_agreements_in_premds;
run;
proc sort data=listofdatasetstocleanup nodupkey out=dest1.listofdatasetstocleanup;
by libname filename;
run;
%mend searchpremium;
%searchpremium;
Hello @alepage,
I would also prefer the format approach suggested by ballardw (where the definition 'PQ' = 'QC' in the VALUE statement should be sufficient), especially if you need the same on-the-fly replacement ('PQ' → 'QC') in several places of your programs or if there are more pairs of values.
Otherwise, yes, you can use a CASE expression as you have suggested.
Example:
data have1;
input prov $ x;
cards;
AB 1
ON 2
PQ 3
BC 4
;
data have2;
input province $ y;
cards;
MB 555
QC 333
BC 444
ON 222
;
%let varname1=prov;
proc sql;
create table want as
select b.province, a.x, b.y
from have1 a inner join have2 b
on case when a.&varname1='PQ' then 'QC' else a.&varname1 end = b.province;
quit;
Or use the IFC function:
on ifc(a.&varname1='PQ','QC',a.&varname1) = b.province;
If I understand correctly you have described one case where you have a variable that may take values of 'PQ' and 'QC' and you want them to match a value of 'QC' in another data set.
Do you have any other values of these variables that also must match? A more complete description might change a suggestion.
Personally instead of trying to get a CASE statement to work I would provide an operation on the value. If this is a small list of values that need to change mapping for the comparison then a custom format may provide the alternate value:
proc format; value $alt_province 'QC','PQ' = 'QC' ; run;
and the comparison would become
and put(a.&varname1., $alt_province.) = b.province and
Caveat: if you have values of the "province" variables that may be longer than two characters you may have to set a default length of the format to be long enough to support them as the code I show will tend to default to a length of 2 a could truncate other values. That would require (default=6) for example to handle up to 6 character length values after the name of the format $alt_province in the code. Do note that any other value of the variable other than those on the left side of the = are not affected by the format and default to their value.
Hello @alepage,
I would also prefer the format approach suggested by ballardw (where the definition 'PQ' = 'QC' in the VALUE statement should be sufficient), especially if you need the same on-the-fly replacement ('PQ' → 'QC') in several places of your programs or if there are more pairs of values.
Otherwise, yes, you can use a CASE expression as you have suggested.
Example:
data have1;
input prov $ x;
cards;
AB 1
ON 2
PQ 3
BC 4
;
data have2;
input province $ y;
cards;
MB 555
QC 333
BC 444
ON 222
;
%let varname1=prov;
proc sql;
create table want as
select b.province, a.x, b.y
from have1 a inner join have2 b
on case when a.&varname1='PQ' then 'QC' else a.&varname1 end = b.province;
quit;
Or use the IFC function:
on ifc(a.&varname1='PQ','QC',a.&varname1) = b.province;
You should add a line at the end of your first step to remove these nasty notes:
NOTE: Variable KEY is uninitialized. NOTE: Variable ORGNL_SAS_DS_ROW_NBR is uninitialized. NOTE: Variable agreement_nbr is uninitialized. NOTE: Variable cie is uninitialized. NOTE: Variable lob is uninitialized. NOTE: Variable year is uninitialized. NOTE: Variable month is uninitialized. NOTE: Variable libname is uninitialized. NOTE: Variable filename is uninitialized. NOTE: Variable company_cd is uninitialized. NOTE: Variable policy_expiry_date is uninitialized. NOTE: Variable AGREEMENT_NBR_DR is uninitialized. NOTE: Variable LINE_OF_BUSINESS is uninitialized. NOTE: Variable PROVINCE is uninitialized. NOTE: Variable LEGACY_UNDERWRITING_COMPANY is uninitialized. NOTE: Variable UNDERWRITING_COMPANY is uninitialized. NOTE: Variable DISTRIBUTOR_NUMBER is uninitialized. NOTE: Variable DISTRIBUTOR_NUMBER_SRC is uninitialized. NOTE: Variable SOURCE_SYSTEM is uninitialized. NOTE: Variable DATASOURCE_BACKEND_REFERENCE is uninitialized. NOTE: Variable LAST_TERM_EFFECTIVE_DATE is uninitialized. NOTE: Variable LAST_TERM_EXPIRY_DATE is uninitialized. NOTE: Variable CONTRACT_UUID is uninitialized.
like this:
...
call missing(of _all_);
stop;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.