Let us try to understand by an example similar to yours
data a;
input profile_number sas_id date:date9.;
format date date9.;
datalines;
123 345 23SEP1974
123 228 24SEP1974
123 607 25SEP1974
224 601 22SEP1974
224 601 30SEP1974
;
After creating dataset you run the below query
proc sql;
create table abcs as
select profile_number, sas_id, min(date) as mindate format=Date9.
from a
group by profile_number;
quit;
what this does is create min date with each group and remerges within that group as you see this note
NOTE: The query requires remerging summary statistics back with the original data.
resultant dataset is below (min date for 123 profile_id is 23sep1974 and is repeated within that group)
profile_number sas_id mindate 123 607 23SEP1974 123 345 23SEP1974 123 228 23SEP1974 224 601 22SEP1974 224 601 22SEP1974
if you see above last 2 row are same and when you apply distinct only one will be left out. In this step distinct is applied after group by and gives result as shown below.
profile_number sas_id mindate 123 228 23SEP1974 123 345 23SEP1974 123 607 23SEP1974 224 601 22SEP1974
query for distinct is below
proc sql;
create table abc as
select distinct profile_number, sas_id, min(date) as mindate format=Date9.
from a
group by profile_number;
quit;
hope this example makes it clear.
... View more