SAS Programming

cleokatt · Posted 01-05-2025 07:11 AM

Hi,
If I have this dataset (psedo, it is really boig instead)

data data;
input DATE date9. A $ B;
format Date date9.;
datalines;
19NOV2008 100000011 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
;
run;

You see that they have the same status (10=deafult let's say) and ChangedDate is differ.. If I want to remove all the "duplicates" they're not duplicates beacuse ChangeDate differs for every row. But I want to have the minimnum date of ChangeDate beacuse it sounds more logial, since that company was already in the status deafult from the begging of my dataset but how can I only keep on row of the company = '100000022 ' without nodup/nodupkey/distinct ?

Thanks!

PaigeMiller · Posted 01-05-2025 07:31 AM

Why the restriction about "without nodup nodupkey distinct"? Why throw out simple approaches that would work?

For anyone else reading along who doesn't have these restrictions:

proc sql;
    create table want as select distinct date,a,b
    from data;
quit;

--
Paige Miller

cleokatt · Posted 01-05-2025 08:55 AM

Beacuse I don't know what I'm deleing with nodupkey, I think it is risky. I think more of aggregate or such..

PaigeMiller · Posted 01-05-2025 12:36 PM

@cleokatt wrote:
Beacuse I don't know what I'm deleing with nodupkey, I think it is risky. I think more of aggregate or such..

That's what the SAS documentation is for! Not risky, many people use it.

--
Paige Miller

quickbluefish · Posted 01-05-2025 07:59 AM

Your sample data doesn't really match your description because the date IS the same for all rows. If the date is not actually the same in the real data, then you can achieve what you're describing by sorting on company, then date and taking the 'first.company`' row:

proc sort data=have; by company date; run;

data want;
set have;
by company;
IF first.company;
run;

cleokatt · Posted 01-05-2025 08:57 AM

Ooh! No I copy pasterd the wrong sample data, but the ChgangeDate should be in differents time stamp. Does your code do that?

Tom · Posted 01-05-2025 09:11 AM

@cleokatt wrote:
Ooh! No I copy pasterd the wrong sample data, but the ChgangeDate should be in differents time stamp. Does your code do that?

That is confusing. DATE values do not have a TIME component. Is your CHANGEDATE variable a DATE value or a DATETIME value (sometimes called a TIMESTAMP value)? If it is a DATE value, like in your example, then you need a separate variable that has the time of day component.

Either way the suggested method will work.

If you have the date and time stored in a single variable, lets call it CHANGEDATETIME, then use:

proc sort data=have;
 by company changedatetime; 
run;

If you have the date and time stored in separate variables then you just need to use both variables in the initial PROC SORT step.

proc sort data=have; 
  by company changedate changetime; 
run;

quickbluefish · Posted 01-05-2025 09:20 AM

Yes, this will return, for each company, the row with the earliest date. But it will only be guaranteed to do this if the date variable is included in the PROC SORT as shown.

cleokatt · Posted 01-05-2025 09:57 AM

proc sort data=have; by company date; run;

data want;
set have;
by company;
IF first.company;
run;

this one??

quickbluefish · Posted 01-05-2025 10:26 AM

yes.

A_Kh · Posted 01-06-2025 11:20 AM

Alternatively, you can use aggregate functions (MIN, MAX etc..) in proc sql to keep the required records BY required groups. Let's say, like in your example, we need a unique record by A (id) and B (status) with the earliest date.

data have;
input DATE date9. A $ B;
format date date9.;
datalines;
19NOV2008 1000011 10
20NOV2008 1000011 10
21NOV2008 1000011 10
22NOV2008 1000011 10
23NOV2008 1000011 10
14NOV2008 1000022 10
15NOV2008 1000022 10
16NOV2008 1000022 10
17NOV2008 1000022 10
18NOV2008 1000022 10
21NOV2008 1000021 10
21NOV2008 1000021 10
23NOV2008 1000021 10
24NOV2008 1000021 10
27NOV2008 1000012 10
28NOV2008 1000012 10
29NOV2008 1000012 10
24NOV2008 1000012 10
20NOV2008 1000012 10
25NOV2008 1000012 10
;
run;

proc sql;
	create table want as
		select min(date) as DATE format=date9., a, b from have
	group by a, b;
quit;

This code selects the earliest date by A and B.

mkeintz · Posted 01-06-2025 02:18 PM

If the data are already sorted by company (as per your sample data), but not by changedate within company, then you can avoid the costs of running a PROC SORT:

data want (drop=_:);
  set have  (in=firstpass)
      have  (in=secondpass);
  by company;

  retain _mindate . ;
  if first.company then call missing(_mindate);

  if firstpass then _mindate=min(changedate,_mindate);
  else if secondpass and changedate=_mindate then output;
run;

Of course, this assumes that each company has only a single instance of minimum changedate. If that's not the case, then you need to tell us whether to keep all such records, or how to choose among them.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Re: Remove rows without nodup, nodupkey, distinct

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...