proc sort data=deno_2 out=deno_2sorted;
by HCNE;
run;
data deno_3v1;
set deno_2sorted;
by HCNE;
if first.HCNE and last.HCNE;
run;
/*72,122*/
proc sql; create table deno_3 as
select distinct HCNE
from deno_2
;
quit;*78,368 , make sure unique patient only;
Hello Can someone advise why the 2 codes above would return different results
Thanks
In the data step you're selecting only rows which are already unique in the source table (first AND last).
The SQL is deduping the rows from the source table so it also returns a unique row where you've got duplicates in source.
data deno_2;
hcne=1; output;
hcne=2; output;output;
stop;
run;
proc sort data=deno_2 out=deno_2sorted;
by HCNE;
run;
/* only pick rows already unique in source */
data ds1_1;
set deno_2sorted;
by HCNE;
if first.HCNE and last.HCNE;
run;
proc sql;
create table ds1_2 as
select HCNE
from deno_2
group by hcne
having count(*)=1
;
quit;
/* dedup rows from source */
data ds2_1;
set deno_2sorted;
by HCNE;
if first.HCNE;
run;
proc sql;
create table ds2_2 as
select distinct HCNE
from deno_2
;
quit;
proc sort data=deno_2 out=ds2_3 nodupkey;
by hcne;
run;
Hi @Ranjeeta
1. Datastep picks only unique occurrences of the values, i.e the value that occurs only once the dataset
2. Proc SQL,- sorts, eliminates the dup occurrences from all , limits to one from all occurrences and outputs . So you would indeed have the difference. HTH
Also, Select distinct can be considered an equivalent of proc sort nodupkey or if first.key in datastep
In the data step you're selecting only rows which are already unique in the source table (first AND last).
The SQL is deduping the rows from the source table so it also returns a unique row where you've got duplicates in source.
data deno_2;
hcne=1; output;
hcne=2; output;output;
stop;
run;
proc sort data=deno_2 out=deno_2sorted;
by HCNE;
run;
/* only pick rows already unique in source */
data ds1_1;
set deno_2sorted;
by HCNE;
if first.HCNE and last.HCNE;
run;
proc sql;
create table ds1_2 as
select HCNE
from deno_2
group by hcne
having count(*)=1
;
quit;
/* dedup rows from source */
data ds2_1;
set deno_2sorted;
by HCNE;
if first.HCNE;
run;
proc sql;
create table ds2_2 as
select distinct HCNE
from deno_2
;
quit;
proc sort data=deno_2 out=ds2_3 nodupkey;
by hcne;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.