Hi Experts,
I need to repeat some of the variables in my input table with prefix "MD_". These new varaibles should carry all the non-missing values from the original variable, however for missing values, it should join this table with master dataset on trans_id and get the value from there. List of required varaibles to be repeated are saved in a separate table (Repeat_Table). Can someone please help me understand how to build this.
 o
o 
Many thanks in advance
DW
This works for me, from your Excel example file:
libname xl Excel "&sasforum.\datasets\DannyWilmot.xls" access=readonly mixed=yes;
proc sql noprint;
select cats("COALESCE(A.",var_repeat,", B.",var_repeat,") as MD_", var_repeat)
into :code separated by ", "
from xl.'Repeat_table$'n;
quit;
proc sql;
create table required_output as
select A.*, &code.
from xl.'Input_Table$'n as A left join
xl.'Master_Data$'n as B on A.trans = B.trans;
quit;
PG
Since your example data doesn't show where the values 799 and 200 come from it is difficult to address what you want. Does your "repeat table" actually have any variable names? You show a column of what looks like variable names but that would not be a valid SAS dataset structure to contain different value, i.e. the 799 and 200 shown that appear to be associated with the System_ID. How do you link the values in Repeat to the input table rows?
the basic structure of the required output is easy:
data want;
set have;
MD_buy_date=buy_date;
MD_System_ID=System_ID;
MD_System_Name=System_name;
MD_Rate=Rate;
run;
but how to update the missing values in want depends on the actual structure and content of your Repeat table.
I prefer to provide tested answers to hand waving explanations. Please post usable example data (not a screenshot) and include the structure of the master dataset. - PG
Hi PGStats,
I have now placed all the required inputs and expected output in an excel file (attached). Unfortunately, I can not share the original data thats why I have tried to create a dummy dataset for this example, For this example we can assume that all the fields are character variables, Also orginal master dataset will have around 200 variables and Var_repeat will have around 30 variables which needs to be repeated in the input table.
Rgds, Danny
Something like:
proc sql;
  create table WANT as
  select  A.TRANS,  
          A.RELEASE,
          A.QUALITY,
          A.MASK,
          A.BUY_DATE,
          A.SYSTEM_ID,
          ...
          COALESCE(A.SYSTEM_ID,MD.SYSTEM_ID) as MD_SYSTEM_ID, /* i.e. take a if present else from MD */
          MD.SYSTEM_NAME as MD_SYSTEM_NAME,
          MD.RATE as MD_RATE
  from    INPUT_DATA A
  left join MASTER_DATA MD
  on      A.TRANS=MD.TRANS
  and     A.BUY_DATE=MD.BY_DATE and A.SYSTEM_ID=MD.SYSTEM_ID;  /* Guessing a bit on the links */
quit;
Hi! Should we assume that Trans_ID is unique in InputTable and MasterData, or that the join should also involve another variable? - PG
Yes, trans_id is unique in both the tables.
Solution provided by RW9 will work but will require me to change the SQL everyone when the list of variables to repeat is changed, can we get the list from the table instead of hard coding this in SQL.
Then you would need to look at putting the code in a macro, or the way I would do it is to generate the code:
data _null_;
set dataset_with_repeat_variables end=last;
     if _n_=1 then call execute('proc sql;
                                               create table WANT as
                                               select  A.TRANS,  
                                                            A.RELEASE,
                                                            A.QUALITY,
                                                            A.MASK,
                                                            A.BUY_DATE,');
call execute('A.'!!strip(name)!!',');
     if last then call execute('          ...
                                             COALESCE(A.SYSTEM_ID,MD.SYSTEM_ID) as MD_SYSTEM_ID, /* i.e. take a if present else from MD */
                                             MD.SYSTEM_NAME as MD_SYSTEM_NAME,
                                             MD.RATE as MD_RATE
                                               from    INPUT_DATA A
                                          left join MASTER_DATA MD
                                          on      A.TRANS=MD.TRANS
                                          and     A.BUY_DATE=MD.BY_DATE and A.SYSTEM_ID=MD.SYSTEM_ID;  /* Guessing a bit on the links */
                                        quit;');
run;
The above is not tested and I had to use ! instead of bar, but you get the idea.
This works for me, from your Excel example file:
libname xl Excel "&sasforum.\datasets\DannyWilmot.xls" access=readonly mixed=yes;
proc sql noprint;
select cats("COALESCE(A.",var_repeat,", B.",var_repeat,") as MD_", var_repeat)
into :code separated by ", "
from xl.'Repeat_table$'n;
quit;
proc sql;
create table required_output as
select A.*, &code.
from xl.'Input_Table$'n as A left join
xl.'Master_Data$'n as B on A.trans = B.trans;
quit;
PG
thanks PGstat and RW9.
thats works great!
Rgds, Danny
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
