DATA Step, Macro, Functions and more

Check Data using UPDATE statement

Reply
Frequent Contributor
Posts: 101

Check Data using UPDATE statement

Hey All,

 

I have two quick questions about the best way to do something using the UPDATE statement.

 

1) I have a master dataset that I am trying to update with a transaction dataset.  However I want to be able to flag if there are large differences between the value in the master dataset that is being updated with a different value in the transaction dataset.   I want to do this to flag things that could be wrong in the transaction dataset and manually check these observations.

 

Is there a way to compare (and maybe output a dataset that includes) the values that are being replaced with the values that are doing the replacing?

 

2) Same question as above but is there way way to conditionally update a value based on a third value, either in the master or the transaction dataset?  Like, update master.a with transaction.a unless master.b gt 10, then don't update.

 

Thanks for your help!

John

Trusted Advisor
Posts: 1,395

Re: Check Data using UPDATE statement

[ Edited ]

For your both questions, it would be better if you post some input and desired output, 

in order to demostrate your needs.

 

In general, for first question - you can rename variables on the transaction dataset

so that doing update you have the origin and the replacing in differet variables:

      data new;

      update master(in=inm)

                  transactions(in=int  rename=(var=var1));

         if int;

         var_diff = var1 - var;

      run;

 

Frequent Contributor
Posts: 101

Re: Check Data using UPDATE statement

Thanks for your reply.

 

I would like to be able to do this without renaming the variables, since the datasets that I am working with are very large and have hundreds of variables.

 

 

Super User
Posts: 5,085

Re: Check Data using UPDATE statement

In general, the answer is that you can't do that.  There may be small exceptions, such as being able to use a WHERE clause that affects only the transaction data set.  But even that has strict requirements for it to be accurate:  one change per transaction observation. 

 

The better approach IMO would be to write a program to compare, before applying the transactions.  For example (assuming both data sets are sorted by ID):

 

data want;

set master (in=in_master) transactions;

by ID;

array nums {*} _numeric_;

array baseline {1000} _temporary_;

if first.ID then do _n_=1 to 1000;

   baseline{_n_} = .;

end;

if in_master then do _n_=1 to dim(nums);

   baseline{_n_} = nums{_n_};

end;

else do _n_=1 to dim(nums);

   if baseline{_n_} > . and nums{_n_} > . then do;

      if not (0.9 * baseline{_n_} <= nums{_n_} <= 1.1 * baseline {_n_}) then do;

         master_value = baseline{_n_};

         trans_value = nums{_n_};

         varname = vname(nums{_n_});

         output;

      end;

   end;

end;

keep id varname master_value trans_value;

end;

 

This looks for values that are within 10% of the original master value and outputs any out of range.  Of course,  you can set up your own conditions.

 

1000 is a guess at a maximum number of temporary variables needed.  As long as it is greater than or equal to the number of numeric variables in the data set, the program (untested as it is) should be OK.

Super User
Super User
Posts: 7,413

Re: Check Data using UPDATE statement

To avoid fixing it at 1000 variables (although why you have that amount I don't know) you could just create two arrays and call missing the second:

data test;
  set sashelp.class;
  array vars{*} _numeric_;
  array other{*} _numeric_;
  call missing(of other{*});
run;

vars{*} contains the data, other{*} is empty, both have same elements.

Trusted Advisor
Posts: 1,395

Re: Check Data using UPDATE statement

Adapt next code to your needs and environment:

filename code_ren '/folders/myshortcuts/My_Folders/flat/code_rename.sas';
filename code_dif '/folders/myshortcuts/My_Folders/flat/code_diff.sas';
%let key = name;

data _null_;
  length var $30 new_var $10;
  fid = open('sashelp.class');
  if fid then do;
     nvars = attrn(fid,'nvars'); 
     file log; put nvars=;

     do i=1 to nvars; 
        var = varname(fid,i); 
        if upcase(strip(var)) ne upcase(strip("&key")) then do;
            new_var = 'var_' || left(i);
            _a_line_ = '  rename '||strip(var)||' = '||strip(new_var)||';';
            file code_ren;
            put _a_line_;
            
            file code_dif;
            if vartype(fid,i) = 'N' then do;
               _a_line_ =  'diff_var'||strip(i)||' = '||strip(new_var)||' - '||strip(var)||';';
               put _a_line_ ;
            end; else do;
               _a_line_ = 'diff_var'||strip(i)||' = ('||strip(new_var)||' = '||strip(var)||');';
               put _a_line_ /;
            end;
         end; 
      end;
  end;
  fid = close(fid);
run;
/****/
data check_ren;
  infile code_ren truncover;
  input a_line $80. ;
run;
data check_dif;
  infile code_dif truncover;
  input a_line $80. ;
run;     
/****/

options source2;
data class_trans;
 set sashelp.class;
     %include code_ren;
run;

data class_new;
 update sashelp.class(in=in_master)
        class_trans(in=in_trans);
 by name;
   if in_master;
   %include code_dif;
run;
        
        


    
Ask a Question
Discussion stats
  • 5 replies
  • 110 views
  • 0 likes
  • 4 in conversation