BookmarkSubscribeRSS Feed
mahler_ji
Obsidian | Level 7

Hey All,

 

I have two quick questions about the best way to do something using the UPDATE statement.

 

1) I have a master dataset that I am trying to update with a transaction dataset.  However I want to be able to flag if there are large differences between the value in the master dataset that is being updated with a different value in the transaction dataset.   I want to do this to flag things that could be wrong in the transaction dataset and manually check these observations.

 

Is there a way to compare (and maybe output a dataset that includes) the values that are being replaced with the values that are doing the replacing?

 

2) Same question as above but is there way way to conditionally update a value based on a third value, either in the master or the transaction dataset?  Like, update master.a with transaction.a unless master.b gt 10, then don't update.

 

Thanks for your help!

John

5 REPLIES 5
Shmuel
Garnet | Level 18

For your both questions, it would be better if you post some input and desired output, 

in order to demostrate your needs.

 

In general, for first question - you can rename variables on the transaction dataset

so that doing update you have the origin and the replacing in differet variables:

      data new;

      update master(in=inm)

                  transactions(in=int  rename=(var=var1));

         if int;

         var_diff = var1 - var;

      run;

 

mahler_ji
Obsidian | Level 7

Thanks for your reply.

 

I would like to be able to do this without renaming the variables, since the datasets that I am working with are very large and have hundreds of variables.

 

 

Astounding
PROC Star

In general, the answer is that you can't do that.  There may be small exceptions, such as being able to use a WHERE clause that affects only the transaction data set.  But even that has strict requirements for it to be accurate:  one change per transaction observation. 

 

The better approach IMO would be to write a program to compare, before applying the transactions.  For example (assuming both data sets are sorted by ID):

 

data want;

set master (in=in_master) transactions;

by ID;

array nums {*} _numeric_;

array baseline {1000} _temporary_;

if first.ID then do _n_=1 to 1000;

   baseline{_n_} = .;

end;

if in_master then do _n_=1 to dim(nums);

   baseline{_n_} = nums{_n_};

end;

else do _n_=1 to dim(nums);

   if baseline{_n_} > . and nums{_n_} > . then do;

      if not (0.9 * baseline{_n_} <= nums{_n_} <= 1.1 * baseline {_n_}) then do;

         master_value = baseline{_n_};

         trans_value = nums{_n_};

         varname = vname(nums{_n_});

         output;

      end;

   end;

end;

keep id varname master_value trans_value;

end;

 

This looks for values that are within 10% of the original master value and outputs any out of range.  Of course,  you can set up your own conditions.

 

1000 is a guess at a maximum number of temporary variables needed.  As long as it is greater than or equal to the number of numeric variables in the data set, the program (untested as it is) should be OK.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

To avoid fixing it at 1000 variables (although why you have that amount I don't know) you could just create two arrays and call missing the second:

data test;
  set sashelp.class;
  array vars{*} _numeric_;
  array other{*} _numeric_;
  call missing(of other{*});
run;

vars{*} contains the data, other{*} is empty, both have same elements.

Shmuel
Garnet | Level 18

Adapt next code to your needs and environment:

filename code_ren '/folders/myshortcuts/My_Folders/flat/code_rename.sas';
filename code_dif '/folders/myshortcuts/My_Folders/flat/code_diff.sas';
%let key = name;

data _null_;
  length var $30 new_var $10;
  fid = open('sashelp.class');
  if fid then do;
     nvars = attrn(fid,'nvars'); 
     file log; put nvars=;

     do i=1 to nvars; 
        var = varname(fid,i); 
        if upcase(strip(var)) ne upcase(strip("&key")) then do;
            new_var = 'var_' || left(i);
            _a_line_ = '  rename '||strip(var)||' = '||strip(new_var)||';';
            file code_ren;
            put _a_line_;
            
            file code_dif;
            if vartype(fid,i) = 'N' then do;
               _a_line_ =  'diff_var'||strip(i)||' = '||strip(new_var)||' - '||strip(var)||';';
               put _a_line_ ;
            end; else do;
               _a_line_ = 'diff_var'||strip(i)||' = ('||strip(new_var)||' = '||strip(var)||');';
               put _a_line_ /;
            end;
         end; 
      end;
  end;
  fid = close(fid);
run;
/****/
data check_ren;
  infile code_ren truncover;
  input a_line $80. ;
run;
data check_dif;
  infile code_dif truncover;
  input a_line $80. ;
run;     
/****/

options source2;
data class_trans;
 set sashelp.class;
     %include code_ren;
run;

data class_new;
 update sashelp.class(in=in_master)
        class_trans(in=in_trans);
 by name;
   if in_master;
   %include code_dif;
run;
        
        


    

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1893 views
  • 0 likes
  • 4 in conversation