BookmarkSubscribeRSS Feed
user112a2
Obsidian | Level 7

I am looking through some code and wondering what this does. Below are the code comments. I'm still not sure what this code does even with the code comments. I have used arrays but not familiar with this code. It looks like this code dedupes by using two indices. Is that correct? So if there is a combination of CCS_DR_IDX and TXN_IDX, it will delete those records?

 

Now handle cases where the dollar matches. If ccs_dr_idx has already been used then delete the record. Dropped txns here will be added back in with the claim data called missing.

 

 

Removed

 

2 REPLIES 2
ballardw
Super User

@user112a2 wrote:

I am looking through some code and wondering what this does. Below are the code comments. I'm still not sure what this code does even with the code comments. I have used arrays but not familiar with this code. It looks like this code dedupes by using two indices. Is that correct? So if there is a combination of CCS_DR_IDX and TXN_IDX, it will delete those records?

 

Now handle cases where the dollar matches. If ccs_dr_idx has already been used then delete the record. Dropped txns here will be added back in with the claim data called missing.

 

 

PROC SORT DATA=OUT.REQ_1_9_F_AMT_MATCH; BY CCS_DR_IDX DATEDIF; RUN;

    DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
        SET OUT.REQ_1_9_F_AMT_MATCH;

        ARRAY id_one{40000} id_one1-id_one40000;
        ARRAY id_two{40000} id_two1-id_two40000;

        RETAIN id_one1-id_one40000 id_two1-id_two40000;

        IF _n_=1 then i=1;
        else i+1;

        do j=1 to i;
            if CCS_DR_IDX=id_one{j} then delete;
        end;

        do k = 1 to i;
                if TXN_IDX = id_two{k} then delete;
        end;

        id_one{i}=CCS_DR_IDX;
        id_two{i}=TXN_IDX;

        drop i j k  id_one1-id_one40000 id_two1-id_two40000;

    run;

 


Do you have data? If so, did you run the code against that data?

 

 

Tom
Super User Tom
Super User

It looks like it is an attempt to eliminate records where either of two variables' value is already included in the output data.

Notice how it waits to save the values until after the possible DELETE statements.

 

The SORT will impact which of the possible duplicates is kept since the first a value is seen the observations is kept (output) and later ones are deleted.

 

You could simplify the data step code.

DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
   SET OUT.REQ_1_9_F_AMT_MATCH;

   ARRAY id_one{40000} _temporary_;
   ARRAY id_two{40000} _temporary_;

   if CCS_DR_IDX in id_one then delete;
   if TXN_IDX in id_two then delete;

   id_one{_n_}=CCS_DR_IDX;
   id_two{_n_}=TXN_IDX;

run;

If there are a lot of re-used codes then you could  reduce the size of the arrays by keeping your own counter as the index to the array. That way the arrays just need to be large enough to match the size of the output dataset, and not the size of the input dataset.

DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
   SET OUT.REQ_1_9_F_AMT_MATCH;

   ARRAY id_one{40000} _temporary_;
   ARRAY id_two{40000} _temporary_;

   if CCS_DR_IDX in id_one then delete;
   if TXN_IDX in id_two then delete;

   i+1;
   id_one{i}=CCS_DR_IDX;
   id_two{i}=TXN_IDX;
   drop i;

run;

But you could also just use hashes instead of arrays and possible increase the speed and the size of datasets it could handle. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 713 views
  • 1 like
  • 3 in conversation