BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
angeliquec
Quartz | Level 8

Hi! I have this dataset with old and new birth dates for each name. I'd like to get the birth date from the rows with Source = "NEW". But for names with no Source = "NEW" (as in the case below SARAH), I'd like to retain the old record. 

 

I'd like to do it with only one SQL statement without need of joining tables, I'm thinking of a group by and min/ max statement.. but I can't get it right. Or is it not possible? Thank you.

 

 

data have;

infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;

 

 

 

data want;

infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09112018 NEW USA
FELIX 07252017 NEW UK
SARAH 09092018 OLD CANADA
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW 
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;


proc sql;
create table want as
select *,max(ADDRESS) as new_ADDRESS
from have
group by name
having SOURCE=min(SOURCE) ;
quit;

View solution in original post

14 REPLIES 14
ChrisNZ
Tourmaline | Level 20

If NEW is always the last record for the person, keep that last record.

data WANT;
 set HAVE;
 by NAME ;
 if last.NAME;
run;

If not, sort and apply that same logic.

 

Reeza
Super User

SAS BY grouping is really powerful in these situations. You can easily do it in SAS, in SQL it can be done but it's not as easy. I would do, two passes, one with the new only and then for records that have a single record, take that. 

 

proc sql;
create table want as 
/*select new records*/
select *
from have
where source='NEW'
union
/*add in records with a single entry and source='OLD', single NEW would already be included if possible*/
select *
from have
group by name
having count(name)=1 and source='OLD';
quit;
novinosrin
Tourmaline | Level 20
data have;

infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW 
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;


proc sql;
create table want(drop=s) as
select *, SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;

PGStats
Opal | Level 21

Clever! Smiley Wink

PG
novinosrin
Tourmaline | Level 20

Thank you sir @PGStats One day I want to code like you. That's exactly the same I told my parents. i was advised by them to copy you shamelessly. Have a great night!

PGStats
Opal | Level 21

Note that @novinosrin's query can be extended to keep the address when there is one, like this:

 

proc sql;
create table want(drop=s) as
select 
    NAME,
    Birthdate, 
    Source,
    coalesce(address, max(address)) as address,
    SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;
PG
bobpep212
Quartz | Level 8
Ah YES! COALESCE!
ChrisNZ
Tourmaline | Level 20

This works too ?


proc sql;
create table want(drop=s) as
select NAME,
    Birthdate, 
    Source,
     max(address) as address,
    SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;
novinosrin
Tourmaline | Level 20

Sir, no need for coalesce coz implicit remerge will take care of it anyway. I was a little tipsy to not pay attention to it in the 1st place. Thanks as always and again

 

proc sql;
create table want(drop=s) as
select 
    NAME,
    Birthdate, 
    Source,
    max(address) as address,
    SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;
PGStats
Opal | Level 21

@novinosrin, @ChrisNZ, well, I thought that if there was an address in a NEW record, I would want that one, even if the address in the OLD record was greater (lexicographically).

PG
bobpep212
Quartz | Level 8

I did it slightly different, since the other solutions didn't seem to resolve keeping the address. I still think a solution using a data step with a by statement is probably better.

 

proc sql;
create table want as 
select distinct 
	t1.name
	,case when t2.name ne '' then t2.birthdate else t1.birthdate end as birthdate
	,case when t2.name ne '' then t2.source else t1.source end as source
	,t1.address
from have as t1
left join (select * from have where source = 'NEW') as t2
on t1.name=t2.name
where t1.address ne ''
;
quit;
Ksharp
Super User
data have;
infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW 
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;


proc sql;
create table want as
select *,max(ADDRESS) as new_ADDRESS
from have
group by name
having SOURCE=min(SOURCE) ;
quit;
angeliquec
Quartz | Level 8

Thank you everyone here for presenting various solutions 🙂

 

I like the Coalesce idea. I forgot about using it!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3368 views
  • 7 likes
  • 7 in conversation