Why using "case" create duplication (proc sql)

Reply
Occasional Contributor
Posts: 5

Why using "case" create duplication (proc sql)

My code is as following:

proc sql;

create table mydata.dsf01 as

select distinct a.date as ddate, a.permco, a.PRC, a.ret, a.shrout, a.cap,

       case

       when a.date=b.date then 1

       else 0

       end as n0

from temp1  a right join mydata.sdc2  b

on a.permco=b.permco

order by permco, ddate;

quit;

the problem is when a.date=b.date, two rows are created in the new table--one with n0=0 and the other with n0=1. I can't find out why. Thank you so much for advise!!

untitled.bmp

Super Contributor
Posts: 1,636

Re: Why using "case" create duplication (proc sql)

try changing

when a.date=b.date

to

when int(a.date)=int(b.date)

Occasional Contributor
Posts: 5

Re: Why using "case" create duplication (proc sql)

Thanks! May I know why you suggested using int()? And why duplication will occur if not?

Thank you very much!

Super Contributor
Posts: 1,636

Re: Why using "case" create duplication (proc sql)

I suggested using int()(or fuzz()) because SAS or excel adds decimals to date values. I had the same problem before.

Linlin

Occasional Contributor
Posts: 5

Re: Why using "case" create duplication (proc sql)

Thanks a lot!

Respected Advisor
Posts: 3,156

Re: Why using "case" create duplication (proc sql)

You actually don't need to use 'case when':

proc sql;

create table mydata.dsf01 as

select distinct a.date as ddate, a.permco, a.PRC, a.ret, a.shrout, a.cap,

  (a.date=b.date) as n0

from temp1 a right join mydata.sdc2 b

on a.permco=b.permco

order by permco, ddate;

quit;

Occasional Contributor
Posts: 5

Re: Why using "case" create duplication (proc sql)

Thanks a lot! I'll try tomorrow. By the way, do you know why my code will produce duplicated record on event day? I'm trying to avoid this kind of flaw in the future.

Thanks again.

Occasional Contributor tlt
Occasional Contributor
Posts: 11

Re: Why using "case" create duplication (proc sql)

The problem is not necessarily when a.date=b.date, since the join is based on a.permco=b.permco

You should look for duplicate permco in b with different dates mydata.sdc2 alias b.

Try select permco, date, count(*) from mydata.sdc2 where permco=80 group by permco, date

hth

Occasional Contributor
Posts: 5

Re: Why using "case" create duplication (proc sql)

Thanks but the purpose of the merger is to identify event date for each permco. Only permcos in table 2 are kept, and then use n0 to identify for each permco, on which trading day the event occurs. I've cleaned all duplication in table 1. In another word, I selected on permco, and created new variable n0 based on date.

I've found another way to do it, but still very curious how my initial code does not work.

Thanks a lot!

Ask a Question
Discussion stats
  • 8 replies
  • 390 views
  • 7 likes
  • 4 in conversation