Solved: Picking unique observations

Ranjeeta · Posted 01-03-2020 02:18 PM

proc sort data=deno_2 out=deno_2sorted;
by HCNE;
run;

data deno_3v1;
set deno_2sorted;
by HCNE;
if first.HCNE and last.HCNE;
run;
/*72,122*/

proc sql; create table deno_3 as
select distinct HCNE
from deno_2
;
quit;*78,368 , make sure unique patient only;

Hello Can someone advise why the 2 codes above would return different results

Thanks

Patrick · Posted 01-03-2020 02:24 PM

In the data step you're selecting only rows which are already unique in the source table (first AND last).

The SQL is deduping the rows from the source table so it also returns a unique row where you've got duplicates in source.

data deno_2;
  hcne=1; output;
  hcne=2; output;output;
  stop;
run;

proc sort data=deno_2 out=deno_2sorted;
  by HCNE;
run;

/* only pick rows already unique in source */
data ds1_1;
  set deno_2sorted;
  by HCNE;
  if first.HCNE and last.HCNE;
run;

proc sql;
  create table ds1_2 as
    select HCNE
    from deno_2
    group by hcne
    having count(*)=1
  ;
quit;

/* dedup rows from source */
data ds2_1;
  set deno_2sorted;
  by HCNE;
  if first.HCNE;
run;

proc sql;
  create table ds2_2 as
    select distinct HCNE
    from deno_2
  ;
quit;

proc sort data=deno_2 out=ds2_3 nodupkey;
  by hcne;
run;

View solution in original post

novinosrin · Posted 01-03-2020 02:24 PM

Hi @Ranjeeta

1. Datastep picks only unique occurrences of the values, i.e the value that occurs only once the dataset

2. Proc SQL,- sorts, eliminates the dup occurrences from all , limits to one from all occurrences and outputs . So you would indeed have the difference. HTH

Also, Select distinct can be considered an equivalent of proc sort nodupkey or if first.key in datastep

Patrick · Posted 01-03-2020 02:24 PM

In the data step you're selecting only rows which are already unique in the source table (first AND last).

The SQL is deduping the rows from the source table so it also returns a unique row where you've got duplicates in source.

data deno_2;
  hcne=1; output;
  hcne=2; output;output;
  stop;
run;

proc sort data=deno_2 out=deno_2sorted;
  by HCNE;
run;

/* only pick rows already unique in source */
data ds1_1;
  set deno_2sorted;
  by HCNE;
  if first.HCNE and last.HCNE;
run;

proc sql;
  create table ds1_2 as
    select HCNE
    from deno_2
    group by hcne
    having count(*)=1
  ;
quit;

/* dedup rows from source */
data ds2_1;
  set deno_2sorted;
  by HCNE;
  if first.HCNE;
run;

proc sql;
  create table ds2_2 as
    select distinct HCNE
    from deno_2
  ;
quit;

proc sort data=deno_2 out=ds2_3 nodupkey;
  by hcne;
run;

Picking unique observations

Re: Picking unique observations

Re: Picking unique observations

Re: Picking unique observations

Picking unique observations

Re: Picking unique observations

Re: Picking unique observations

Re: Picking unique observations

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!