Solved: Re: Getting new record with SQL

angeliquec · Posted 09-11-2018 09:06 PM

Hi! I have this dataset with old and new birth dates for each name. I'd like to get the birth date from the rows with Source = "NEW". But for names with no Source = "NEW" (as in the case below SARAH), I'd like to retain the old record.

I'd like to do it with only one SQL statement without need of joining tables, I'm thinking of a group by and min/ max statement.. but I can't get it right. Or is it not possible? Thank you.

data have;

infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;

data want;

infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09112018 NEW USA
FELIX 07252017 NEW UK
SARAH 09092018 OLD CANADA
;
run;

Ksharp · Posted 09-12-2018 08:59 AM

data have;
infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW 
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;


proc sql;
create table want as
select *,max(ADDRESS) as new_ADDRESS
from have
group by name
having SOURCE=min(SOURCE) ;
quit;

View solution in original post

ChrisNZ · Posted 09-11-2018 09:34 PM

If NEW is always the last record for the person, keep that last record.

data WANT;
 set HAVE;
 by NAME ;
 if last.NAME;
run;

If not, sort and apply that same logic.

High-Performance SAS Coding - Third Edition

Reeza · Posted 09-11-2018 10:18 PM

SAS BY grouping is really powerful in these situations. You can easily do it in SAS, in SQL it can be done but it's not as easy. I would do, two passes, one with the new only and then for records that have a single record, take that.

proc sql;
create table want as 
/*select new records*/
select *
from have
where source='NEW'
union
/*add in records with a single entry and source='OLD', single NEW would already be included if possible*/
select *
from have
group by name
having count(name)=1 and source='OLD';
quit;

novinosrin · Posted 09-11-2018 10:31 PM

data have;

infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW 
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;


proc sql;
create table want(drop=s) as
select *, SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;

PGStats · Posted 09-11-2018 11:37 PM

Clever!

PG

novinosrin · Posted 09-11-2018 11:49 PM

Thank you sir @PGStats One day I want to code like you. That's exactly the same I told my parents. i was advised by them to copy you shamelessly. Have a great night!

PGStats · Posted 09-11-2018 11:55 PM

Note that @novinosrin's query can be extended to keep the address when there is one, like this:

proc sql;
create table want(drop=s) as
select 
    NAME,
    Birthdate, 
    Source,
    coalesce(address, max(address)) as address,
    SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;

PG

bobpep212 · Posted 09-11-2018 11:57 PM

Ah YES! COALESCE!

ChrisNZ · Posted 09-12-2018 12:00 AM

This works too ?


proc sql;
create table want(drop=s) as
select NAME,
    Birthdate, 
    Source,
     max(address) as address,
    SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;

High-Performance SAS Coding - Third Edition

novinosrin · Posted 09-12-2018 12:02 AM

Sir, no need for coalesce coz implicit remerge will take care of it anyway. I was a little tipsy to not pay attention to it in the 1st place. Thanks as always and again

proc sql;
create table want(drop=s) as
select 
    NAME,
    Birthdate, 
    Source,
    max(address) as address,
    SOURCE='NEW' as s
from have
group by name
having max(S)=s ;
quit;

PGStats · Posted 09-12-2018 12:08 AM

@novinosrin, @ChrisNZ, well, I thought that if there was an address in a NEW record, I would want that one, even if the address in the OLD record was greater (lexicographically).

PG

ChrisNZ · Posted 09-12-2018 12:12 AM

@PGStats Good thinking!

High-Performance SAS Coding - Third Edition

bobpep212 · Posted 09-11-2018 11:43 PM

I did it slightly different, since the other solutions didn't seem to resolve keeping the address. I still think a solution using a data step with a by statement is probably better.

proc sql;
create table want as 
select distinct 
	t1.name
	,case when t2.name ne '' then t2.birthdate else t1.birthdate end as birthdate
	,case when t2.name ne '' then t2.source else t1.source end as source
	,t1.address
from have as t1
left join (select * from have where source = 'NEW') as t2
on t1.name=t2.name
where t1.address ne ''
;
quit;

Ksharp · Posted 09-12-2018 08:59 AM

data have;
infile datalines missover;
input NAME :$5. BIRTHDATE :$8. SOURCE :$3. ADDRESS :$8.;
datalines;
ANGEL 09122018 OLD USA
ANGEL 09112018 NEW 
FELIX 07252017 OLD UK
FELIX 07252017 NEW
SARAH 09092018 OLD CANADA
;
run;


proc sql;
create table want as
select *,max(ADDRESS) as new_ADDRESS
from have
group by name
having SOURCE=min(SOURCE) ;
quit;

angeliquec · Posted 09-12-2018 08:11 PM

Thank you everyone here for presenting various solutions 🙂

I like the Coalesce idea. I forgot about using it!

Classroom Training Available!