I'm converting SAS code to proc sql temporarily write to an Oracle schema rather than the work library. I need to replicate the exact logic of the below. I can't use distinct without additional logic because I need to prioritize rows that meet a criteria. For example a row would look like
col1 clol2 priority col3
apple orange 1 cherry
apple orange 2 strawberry
Is there an Oracle Equivalent of:
data master_academic_hierarchy; set wip.master_academic_hierarchy1;
if sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then priority=1; else priority=2;
run;
/*dedup based on program*/
proc sort data=master_academic_hierarchy; by sobcurr_program priority descending sorcmjr_term_code_eff;
proc sort data=master_academic_hierarchy nodupkey; by sobcurr_program;
Presuming you are using pass through SQL. You can use a CASE statement in your ORDER BY.
...
ORDER BY
sobcurr_program
, case
when sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then 1 else 2
end descending
, sorcmjr_term_code_eff
You might also need a select, hard to know what you fully want from the limited information
...
SELECT
...
, case
when sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then 1 else 2
end as priority
...
Where is wip.master_academic_hierarchy1 coming from? Oracle or SAS?
Assuming your source table is in Oracle using the row_number() analytics functions should get you very close to what the SAS code does (code not tested).
create table master_academic_hierarchy as
SELECT *
FROM
(
SELECT
*,
row_number() OVER(
PARTITION BY
sobcurr_program
ORDER BY
case when sorcmjr_adm_ind='Y' or sorcmjr_stu_ind='Y' then 1 else 2 end DESC,
sorcmjr_term_code_eff NULLS FIRST
) row_num
FROM
master_academic_hierarchy1
)
WHERE row_num = 1;
It is coming from Oracle. I'm trying to avoid using the work library. Currently testing the posted code.
Unfortunately, this wouldn't prioritize selected rows first. This would only select filtered rows first. Hence it is not depuping logic.
Once you get the rownum can you not filter the rownum on the minimum for each group?
@DavidPhillips2 wrote:
Unfortunately, this wouldn't prioritize selected rows first. This would only select filtered rows first. Hence it is not depuping logic.
I believe you're missing something. This SQL should return the same result than your combination of two Proc Sorts.
Your first Proc Sort orders the rows by multiple variables.
proc sort data=master_academic_hierarchy; by sobcurr_program priority descending sorcmjr_term_code_eff;
The SQL creates a new column Row_Num that re-starts with 1 for every new value of sobcurr_program.
Within a group (a partition) the row count is along the sort order of priority descending sorcmjr_term_code_eff - the first row after the sort will have row_num=1
Your 2nd Proc Sort selects the first record for every new value of sobcurr_program - and because the SAS table doesn't get re-sorted it's the first row from your first more granular sort.
The SQL select the rows WHERE row_num = 1 - that's pretty much the same than what the 2nd Proc Sort with NODUPKEY does.
The result of the two Proc Sorts and the SQL will only differ if sorting by sobcurr_program priority descending sorcmjr_term_code_eff can still have multiple rows within the same value for sorcmjr_term_code_eff . Due to the sequential nature of SAS data processing the result for the same source table will always be the same but with SQL the row order within a tuple is not deterministic so it can vary.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.