BookmarkSubscribeRSS Feed
Babloo
Rhodochrosite | Level 12

Appreciate if someone of you help me tackle in the following situation to create a mapping in DI Studio.

 

I have to create one variable called 'KNDATL' from the  table POLICY and the field is 'Policy_Issue_dt'

 

If 'Policy_Issue_dt' is missing then I've to compare the variable KNKATD from the current month dataset (POL_SEP_2018) with KNKATD of previous month dataset (POL_AUG_2018) .

 

If equal, than take the same KNDATL of previous month, if different, take last date of previous month (e.g. 31-08-2018).

 

KNKATD is character variable which has values like '1', '2', '3', '4'

KNDATL is numeric variable which has values like 31-08-2018, 17-08-2018....

 

 

7 REPLIES 7
VenuKadari
SAS Employee

Could you use a user written transform or does it have to be an existing transform. Assuming your monthly tables are not too large have you considered using hash lookup to get what you need?  Should be able to solve this using DI Studio transforms as well. Need a little more time to work through. 

 

 

ChrisLysholm
SAS Employee

Hello,

 

I am researching a solution that would allow you to use an existing transformation, such as Lookup, to perform the steps you have outlined.

 

I do have 2 user written solutions that may suffice in the meantime.

I have added these solutions below.

 

I will post back when I have answer to ability of performing these steps using existing transformation.

 

USER WRITTEN SOLUTIONS FOLLOW:

 

 


data work.pol_jul_2018;

attrib
       POLICY_ID length=8 informat=best32. format=best32. 
       KNKATD length=$1 informat=$1. format=$1.
       KNDATL length=8 format=ddmmyyd10.
;

policy_id=1;
knkatd='1';
kndatl='15jul2018'd;
output;

policy_id=2;
knkatd='4';
kndatl='01jul2018'd;
output;

run;


data work.pol_aug_2018;

attrib
       POLICY_ID length=8 informat=best32. format=best32. 
       KNKATD length=$1 informat=$1. format=$1.
       POLICY_ISSUE_DT length=8 format=ddmmyyd8.
;

policy_id=1;
knkatd='1';
policy_issue_dt=.;
output;

policy_id=2;
knkatd='1';
policy_issue_dt=.;
output;

run;


/* SOLUTION 1: DATA STEP MERGE SOLUTION */

proc sort data=work.pol_jul_2018;
by policy_id;
run;

proc sort data=work.pol_aug_2018;
by policy_id;
run;

data work.pol_aug_2018_update(drop=knkatd_old);

merge work.pol_jul_2018(in=inOLD rename=(knkatd=knkatd_old))
      work.pol_aug_2018(in=inNEW);
by policy_id;

if inOLD and inNEW then do;
   if missing(policy_issue_dt) then do;
      if knkatd_old ne knkatd then do;
         kndatl=intnx ('month',kndatl,0,'E');
      end;
   end;
end;

run;


/* SOLUTION 2: DATA STEP HASH SOLUTION */

data work.pol_aug_2018_update2(drop=knkatd_old);
   
   if 0 then
      set work.pol_aug_2018;
      
   if _N_ eq 1 then
   do;
      declare hash hct(dataset: 'work.pol_aug_2018');
                   hct.defineKey("policy_id");
                   hct.defineData("knkatd", "policy_issue_dt");
                   hct.defineDone();
   end;
   
   set work.pol_jul_2018(rename=(knkatd=knkatd_old)); 
   
   /* policy_id match  */ 
   if hct.find() eq 0 then
   do;
      if missing(policy_issue_dt) then do;
         if knkatd_old ne knkatd then do;
            kndatl=intnx ('month',kndatl,0,'E');
         end;
      end;
   end;
      
run;

 

Patrick
Opal | Level 21

From the looks of it you could use a SQL Join transformation where the expression is a CASE statement and for the bit where you have to read data from another table it's then a sub-select within the CASE statement.

That should work conceptually but if the table for the sub-select isn't of low volume then performance will likely be rather bad. Having an index would eventually help.

 

Else: If performance is important then that's eventually a case where using user written code (or a custom transformation) is justified.

Babloo
Rhodochrosite | Level 12

Appreciate if you could help me with the sample CASE WHEN statement which suits this scenario. I never came across this scenario before.

Patrick
Opal | Level 21

@Babloo

Please provide representative sample data in the form of tested SAS data steps. Then show us the desired result.

 

The sample data should cover all the cases and it should also contain multiple records per policy so we get the logic right for "take last date of previous month". 

 

Once you provide all this information we should be in good shape to come-up with a single SQL creating the desired result - which then is something you could implement using a DIS SQL Join transformation. 

 

I would assume some people will also provide other coding options and if these are much simpler or perform much better then you might decide to go for a user written transformation instead.

 

Please also tell us with what data volumes you're dealing with in your real case and how important performance is.

 

Also relevant: Where is the data stored? Sources and target?

VenuKadari
SAS Employee

Perhaps you could use sql like this to create a dataset consisting of values where kndatl needs to be adjusted and merge with data where such adjustment is not needed. This can be done with DI Studio SQL transforms.

 

/* Records where policy_issue_dt is missing and knkatd does not match with previous month */

proc sql;
create table work.missing as
select aug.policy_id, aug.knkatd, aug.policy_issue_dt, jul.kndatl as jul_kndatl,
intnx ('month', jul.kndatl, 0, 'E') as kndatl format DDMMYYD10.
from work.pol_aug_2018 as aug
inner join work.pol_jul_2018 as jul
on aug.policy_id=jul.policy_id
and aug.policy_issue_dt = .
and jul.knkatd ne aug.knkatd
;
quit;

ChrisLysholm
SAS Employee

Well I have tried sql join and CASE statement but was not able to get this to work.

 

A simple UNION will produce results I believe you are looking for and could work with one of the SQL based transformations such as JOIN.

 

Sample query is below:

 


proc sql;

create table work.pol_aug_2018_update as
select t2.policy_id,
       t2.knkatd,
       t2.policy_issue_dt,
       t1.kndatl as kndatl format=ddmmyyd10.
from work.pol_jul_2018 as t1,
     work.pol_aug_2018 as t2
       where t1.policy_id = t2.policy_id and
             missing(policy_issue_dt) and
             t1.knkatd = t2.knkatd
union
select t2.policy_id,
       t2.knkatd,
       t2.policy_issue_dt,
       intnx ('month', t1.kndatl, 0, 'E') as kndatl format=ddmmyyd10.
from work.pol_jul_2018 as t1,
     work.pol_aug_2018 as t2
       where t1.policy_id = t2.policy_id and
             missing(policy_issue_dt) and
             t1.knkatd ne t2.knkatd;

quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1583 views
  • 0 likes
  • 4 in conversation