topic Re: Picking unique observations in SAS Programming

Picking unique observations

Ranjeeta — Fri, 03 Jan 2020 19:18:39 GMT

proc sort data=deno_2 out=deno_2sorted;
by HCNE;
run;

data deno_3v1;
set deno_2sorted;
by HCNE;
if first.HCNE and last.HCNE;
run;
/*72,122*/

proc sql; create table deno_3 as
select distinct HCNE
from deno_2
;
quit;*78,368 , make sure unique patient only;

Hello Can someone advise why the 2 codes above would return different results

Thanks

Re: Picking unique observations

novinosrin — Fri, 03 Jan 2020 19:30:20 GMT

Hi @Ranjeeta

1. Datastep picks only unique occurrences of the values, i.e the value that occurs only once the dataset

2. Proc SQL,- sorts, eliminates the dup occurrences from all , limits to one from all occurrences and outputs . So you would indeed have the difference. HTH

Also, Select distinct can be considered an equivalent of proc sort nodupkey or if first.key in datastep

Re: Picking unique observations

Patrick — Fri, 03 Jan 2020 20:28:18 GMT

In the data step you're selecting only rows which are already unique in the source table (first AND last).

The SQL is deduping the rows from the source table so it also returns a unique row where you've got duplicates in source.

data deno_2;
  hcne=1; output;
  hcne=2; output;output;
  stop;
run;

proc sort data=deno_2 out=deno_2sorted;
  by HCNE;
run;

/* only pick rows already unique in source */
data ds1_1;
  set deno_2sorted;
  by HCNE;
  if first.HCNE and last.HCNE;
run;

proc sql;
  create table ds1_2 as
    select HCNE
    from deno_2
    group by hcne
    having count(*)=1
  ;
quit;

/* dedup rows from source */
data ds2_1;
  set deno_2sorted;
  by HCNE;
  if first.HCNE;
run;

proc sql;
  create table ds2_2 as
    select distinct HCNE
    from deno_2
  ;
quit;

proc sort data=deno_2 out=ds2_3 nodupkey;
  by hcne;
run;