Re: Dedup Code Convert to SQL

DavidPhillips2 · Posted 06-02-2020 01:38 PM

I'm converting SAS code to proc sql temporarily write to an Oracle schema rather than the work library. I need to replicate the exact logic of the below. I can't use distinct without additional logic because I need to prioritize rows that meet a criteria. For example a row would look like

col1 clol2 priority col3

apple orange 1 cherry

apple orange 2 strawberry

Is there an Oracle Equivalent of:

data master_academic_hierarchy; set wip.master_academic_hierarchy1;
  if sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then priority=1; else priority=2;
run;

/*dedup based on program*/
proc sort data=master_academic_hierarchy; by sobcurr_program priority descending sorcmjr_term_code_eff;
proc sort data=master_academic_hierarchy nodupkey; by sobcurr_program;

RichardDeVen · Posted 06-02-2020 05:15 PM

Presuming you are using pass through SQL. You can use a CASE statement in your ORDER BY.

...
  ORDER BY 
    sobcurr_program 
    , case 
        when sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then 1 else 2
      end descending
    , sorcmjr_term_code_eff

You might also need a select, hard to know what you fully want from the limited information

...
  SELECT 
...
    , case 
        when sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then 1 else 2
      end as priority
...

SASKiwi · Posted 06-02-2020 05:22 PM

Where is wip.master_academic_hierarchy1 coming from? Oracle or SAS?

Patrick · Posted 06-02-2020 07:01 PM

Assuming your source table is in Oracle using the row_number() analytics functions should get you very close to what the SAS code does (code not tested).

create table master_academic_hierarchy as
SELECT * 
FROM
  (
    SELECT 
      *,
      row_number() OVER(
          PARTITION BY 
            sobcurr_program
          ORDER BY 
            case when sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then 1 else 2 end DESC, 
            sorcmjr_term_code_eff NULLS FIRST
      ) row_num
    FROM 
        master_academic_hierarchy1  
  )
WHERE row_num = 1;

DavidPhillips2 · Posted 06-03-2020 08:54 AM

It is coming from Oracle. I'm trying to avoid using the work library. Currently testing the posted code.

DavidPhillips2 · Posted 06-03-2020 09:00 AM

Unfortunately, this wouldn't prioritize selected rows first. This would only select filtered rows first. Hence it is not depuping logic.

smantha · Posted 06-03-2020 10:40 AM

Once you get the rownum can you not filter the rownum on the minimum for each group?

Patrick · Posted 06-03-2020 11:16 AM

@DavidPhillips2 wrote:

Unfortunately, this wouldn't prioritize selected rows first. This would only select filtered rows first. Hence it is not depuping logic.

I believe you're missing something. This SQL should return the same result than your combination of two Proc Sorts.

Your first Proc Sort orders the rows by multiple variables.

proc sort data=master_academic_hierarchy; by sobcurr_program priority descending sorcmjr_term_code_eff;

The SQL creates a new column Row_Num that re-starts with 1 for every new value of sobcurr_program.

Within a group (a partition) the row count is along the sort order of priority descending sorcmjr_term_code_eff - the first row after the sort will have row_num=1

Your 2nd Proc Sort selects the first record for every new value of sobcurr_program - and because the SAS table doesn't get re-sorted it's the first row from your first more granular sort.

The SQL select the rows WHERE row_num = 1 - that's pretty much the same than what the 2nd Proc Sort with NODUPKEY does.

The result of the two Proc Sorts and the SQL will only differ if sorting by sobcurr_program priority descending sorcmjr_term_code_eff can still have multiple rows within the same value for sorcmjr_term_code_eff . Due to the sequential nature of SAS data processing the result for the same source table will always be the same but with SQL the row order within a tuple is not deterministic so it can vary.

Registration is open

SAS Training: Just a Click Away