BookmarkSubscribeRSS Feed
cleokatt
Calcite | Level 5

Hi, 
If I have this dataset (psedo, it is really boig instead)

data data;
input DATE date9. A $ B;
format Date date9.;
datalines;
19NOV2008 100000011 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
24NOV2008 100000022 10
;
run;

 You see that they have the same status (10=deafult let's say) and ChangedDate is differ.. If I want to remove all the "duplicates"  they're not duplicates beacuse ChangeDate differs for every row. But I want to have the minimnum date of ChangeDate beacuse it sounds more logial, since that company was already in the status deafult from the begging of my dataset but how can I only keep on row of the company = '100000022 ' without nodup/nodupkey/distinct ? 

Thanks!

 

11 REPLIES 11
PaigeMiller
Diamond | Level 26

Why the restriction about "without nodup nodupkey distinct"? Why throw out simple approaches that would work?

 

For anyone else reading along who doesn't have these restrictions:

 

proc sql;
    create table want as select distinct date,a,b
    from data;
quit;
--
Paige Miller
cleokatt
Calcite | Level 5
Beacuse I don't know what I'm deleing with nodupkey, I think it is risky. I think more of aggregate or such..
PaigeMiller
Diamond | Level 26

@cleokatt wrote:
Beacuse I don't know what I'm deleing with nodupkey, I think it is risky. I think more of aggregate or such..

That's what the SAS documentation is for! Not risky, many people use it.

 

--
Paige Miller
quickbluefish
Barite | Level 11

Your sample data doesn't really match your description because the date IS the same for all rows.  If the date is not actually the same in the real data, then you can achieve what you're describing by sorting on company, then date and taking the 'first.company`' row:

 

proc sort data=have; by company date; run;

data want;
set have;
by company;
IF first.company;
run;

 

cleokatt
Calcite | Level 5
Ooh! No I copy pasterd the wrong sample data, but the ChgangeDate should be in differents time stamp. Does your code do that?
Tom
Super User Tom
Super User

@cleokatt wrote:
Ooh! No I copy pasterd the wrong sample data, but the ChgangeDate should be in differents time stamp. Does your code do that?

That is confusing.  DATE values do not have a TIME component.   Is your CHANGEDATE variable a DATE value or a DATETIME value (sometimes called a TIMESTAMP value)?  If it is a DATE value, like in your example, then you need a separate variable that has the time of day component.

 

Either way the suggested method will work. 

 

If you have the date and time stored in a single variable, lets call it CHANGEDATETIME, then use:

proc sort data=have;
 by company changedatetime; 
run;

If you have the date and time stored in separate variables then you just need to use both variables in the initial PROC SORT step.

proc sort data=have; 
  by company changedate changetime; 
run;
quickbluefish
Barite | Level 11
Yes, this will return, for each company, the row with the earliest date. But it will only be guaranteed to do this if the date variable is included in the PROC SORT as shown.
cleokatt
Calcite | Level 5
proc sort data=have; by company date; run;

data want;
set have;
by company;
IF first.company;
run;

 

 

 

this one??

A_Kh
Barite | Level 11

Alternatively, you can use aggregate functions (MIN, MAX etc..) in proc sql to keep the required records BY required groups. Let's say, like in your example, we need a unique record  by A (id) and B (status) with the earliest date. 

data have;
input DATE date9. A $ B;
format date date9.;
datalines;
19NOV2008 1000011 10
20NOV2008 1000011 10
21NOV2008 1000011 10
22NOV2008 1000011 10
23NOV2008 1000011 10
14NOV2008 1000022 10
15NOV2008 1000022 10
16NOV2008 1000022 10
17NOV2008 1000022 10
18NOV2008 1000022 10
21NOV2008 1000021 10
21NOV2008 1000021 10
23NOV2008 1000021 10
24NOV2008 1000021 10
27NOV2008 1000012 10
28NOV2008 1000012 10
29NOV2008 1000012 10
24NOV2008 1000012 10
20NOV2008 1000012 10
25NOV2008 1000012 10
;
run;

proc sql;
	create table want as
		select min(date) as DATE format=date9., a, b from have
	group by a, b;
quit; 

This code selects the earliest date by A and B. 

mkeintz
PROC Star

If the data are already sorted by company (as per your sample data), but not by changedate within company, then you can avoid the costs of running a PROC SORT:

 

data want (drop=_:);
  set have  (in=firstpass)
      have  (in=secondpass);
  by company;

  retain _mindate . ;
  if first.company then call missing(_mindate);

  if firstpass then _mindate=min(changedate,_mindate);
  else if secondpass and changedate=_mindate then output;
run;

Of course, this assumes that each company has only a single instance of minimum changedate.  If that's not the case, then you need to tell us whether to keep all such records, or how to choose among them.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1843 views
  • 0 likes
  • 6 in conversation