SAS, array code, two indices, dropping records

user112a2 · Posted 10-16-2018 04:51 PM

I am looking through some code and wondering what this does. Below are the code comments. I'm still not sure what this code does even with the code comments. I have used arrays but not familiar with this code. It looks like this code dedupes by using two indices. Is that correct? So if there is a combination of CCS_DR_IDX and TXN_IDX, it will delete those records?

Now handle cases where the dollar matches. If ccs_dr_idx has already been used then delete the record. Dropped txns here will be added back in with the claim data called missing.

Removed

ballardw · Posted 10-16-2018 05:07 PM

@user112a2 wrote:

I am looking through some code and wondering what this does. Below are the code comments. I'm still not sure what this code does even with the code comments. I have used arrays but not familiar with this code. It looks like this code dedupes by using two indices. Is that correct? So if there is a combination of CCS_DR_IDX and TXN_IDX, it will delete those records?

Now handle cases where the dollar matches. If ccs_dr_idx has already been used then delete the record. Dropped txns here will be added back in with the claim data called missing.
PROC SORT DATA=OUT.REQ_1_9_F_AMT_MATCH; BY CCS_DR_IDX DATEDIF; RUN;

    DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
        SET OUT.REQ_1_9_F_AMT_MATCH;

        ARRAY id_one{40000} id_one1-id_one40000;
        ARRAY id_two{40000} id_two1-id_two40000;

        RETAIN id_one1-id_one40000 id_two1-id_two40000;

        IF _n_=1 then i=1;
        else i+1;

        do j=1 to i;
            if CCS_DR_IDX=id_one{j} then delete;
        end;

        do k = 1 to i;
                if TXN_IDX = id_two{k} then delete;
        end;

        id_one{i}=CCS_DR_IDX;
        id_two{i}=TXN_IDX;

        drop i j k  id_one1-id_one40000 id_two1-id_two40000;

    run;

Do you have data? If so, did you run the code against that data?

Tom · Posted 10-16-2018 05:28 PM

It looks like it is an attempt to eliminate records where either of two variables' value is already included in the output data.

Notice how it waits to save the values until after the possible DELETE statements.

The SORT will impact which of the possible duplicates is kept since the first a value is seen the observations is kept (output) and later ones are deleted.

You could simplify the data step code.

DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
   SET OUT.REQ_1_9_F_AMT_MATCH;

   ARRAY id_one{40000} _temporary_;
   ARRAY id_two{40000} _temporary_;

   if CCS_DR_IDX in id_one then delete;
   if TXN_IDX in id_two then delete;

   id_one{_n_}=CCS_DR_IDX;
   id_two{_n_}=TXN_IDX;

run;

If there are a lot of re-used codes then you could reduce the size of the arrays by keeping your own counter as the index to the array. That way the arrays just need to be large enough to match the size of the output dataset, and not the size of the input dataset.

DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
   SET OUT.REQ_1_9_F_AMT_MATCH;

   ARRAY id_one{40000} _temporary_;
   ARRAY id_two{40000} _temporary_;

   if CCS_DR_IDX in id_one then delete;
   if TXN_IDX in id_two then delete;

   i+1;
   id_one{i}=CCS_DR_IDX;
   id_two{i}=TXN_IDX;
   drop i;

run;

But you could also just use hashes instead of arrays and possible increase the speed and the size of datasets it could handle.

SAS, array code, two indices, dropping records

Re: SAS, array code, two indices, dropping records

Re: SAS, array code, two indices, dropping records

SAS, array code, two indices, dropping records

Re: SAS, array code, two indices, dropping records

Re: SAS, array code, two indices, dropping records

SAS Innovate 2025: Save the Date