Help using Base SAS procedures

Merging Issue

Reply
Occasional Contributor
Posts: 18

Merging Issue

Hi,

I'm trying to merge two datasets. The reference column lists the dates.

I've tried formatting the dates in both the columns i.e. to ensure that formats in both the data sets are the same. I've tried date9., mmddyy10., mmddyy.

When I look at the distinct dates, to cross check if the merging was correct, I see the following situation:

Date,PLC

06/27/2014,1

06/27/2014,    1

06/28/2014,1

06/28/2014,    1


This is just a sample of the output. I see a number of duplicate rows.

My code:

data data.hm1;

set data.hm1;

format date mmddyy10.;

run;

proc sort data=data.hm1;

by date;

run;

data data.S5_LTE;

set data.S5_LTE;

format date mmddyy10.;

run;


data data.hm1;

set data.hm1;

plc1 = put(plc,6.);

drop plc;

rename plc1 = plc;

run;

proc sort data=data.S5_LTE;

by date;

run;

data data.S5_LTE;

merge data.S5_LTE ( in=a)  data.hm1(in=b) ;

by date ;

if a and b;

run;

Can you please suggest a way out?

Thanks,

Nimish

Attachment
Attachment
Valued Guide
Posts: 854

Re: Merging Issue

Can you provide a small sample of the data you have and what you want it to look like?  I'm not sure what you are asking.

Occasional Contributor
Posts: 18

Re: Merging Issue

Hi,

I've attached two csv files for your reference. I'm trying to merge these files by using the date column for reference.

Regards,

Nimish

Valued Guide
Posts: 854

Re: Merging Issue

I did a little guess work here.  Because I'm informatting the dates I don't know what your issue is.  I'm assuming you have one date as a number and one as a character.  If this is the case take a look at the new_date and this should help.  Also I'm not sure if you want a full join or left join the way I have it.

Hope this helps:

data have1;

infile cards dsd;

informat date mmddyy10.;

format date mmddyy10.;

input date plc;

cards;

7/14/2014,1

7/21/2014,1

7/24/2014,1

7/28/2014,1

7/11/2014,1

7/14/2014,1

7/21/2014,1

7/28/2014,1

7/30/2014,1

7/8/2014,1

7/18/2014,1

7/28/2014,1

7/29/2014,1

7/1/2014,1

7/2/2014,1

7/3/2014,1

7/8/2014,1

7/11/2014,1

7/14/2014,1

;

run;

data have2;

infile cards dsd;

length modelname $30. date $10.;

input modelname $ date $ sales lsales;

cards;

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/21/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/24/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/11/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/21/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/30/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/8/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/18/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/29/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/1/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/2/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/3/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/8/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/11/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181

;

run;

data want_date;

format new_date mmddyy10.;

set have2;

new_date = input(put(date,$10.),mmddyy10.);

run;

proc sql;

create table want as

select a.*,b.plc

from want_date a full join

     have1 b on

a.new_date = b.date;

Occasional Contributor
Posts: 18

Re: Merging Issue

hi,

so, i used the input suggestion. i get the following Warning 'Variable Date has already been defined as numeric'.

I also additionally see this statement in the log:

NOTE: Invalid argument to function INPUT at line 24 column 11.

Date=02/20/2013 N=1 Match=0 Holiday=0 Monday=0 Dum_Var=0 launch=0 plc=0 new_date=. _ERROR_=1 _N_=1

Do you know whats causing the issue?

Thanks,

Nimish

Occasional Contributor
Posts: 18

Re: Merging Issue

PS - you've understood the problem correctly.

Valued Guide
Posts: 854

Re: Merging Issue

If both of your date variables are numbers then you should be able to merge them, remove the piece of code with the input(put()) part and merge on date.  The reason for the 'want_date' datastep would be to turn a character variable into a number field with a date format.

proc sql;

create table want as

select a.*,b.plc

from have2 a full join

     have1 b on

a.date = b.date;

Occasional Contributor
Posts: 18

Re: Merging Issue

Hi,

That's original problem I was referring to.

When I didn't format the dates and tried merging them, it led to duplicate values (please refer to the example in my original post).

I tried formatting the dates etc. but it hasn't helped.

Can you think of any other solution?

Cheers,

Nimish

Valued Guide
Posts: 854

Re: Merging Issue

In your original post the attachment shows dates formatted the same so I can't see what the issue is,

if the only issue is the duplicate values add distinct to the join

proc sql;

create table want as

select distinct a.*,b.plc

from have2 a full join

     have1 b on

a.date = b.date;

Occasional Contributor
Posts: 18

Re: Merging Issue

Okay. Let me spell out the issue again.

1) I need to merge two data sets by using the date as a reference

2) When I did merge them, I noticed duplicate entries (selected distinct date & plc using proc sql).

3) Ideally for each date, there should only be one PLC value. But when i looked into dataset, there were two PLC values assigned for each date. e.g:

Date,PLC

06/27/2014,1

06/27/2014,    1

06/28/2014,1

06/28/2014,    1

5) I've tried converting all the PLC values to text (&all numeric as well), but it hasn't helped.

6) I then tried formatting the dates, but it hasn't helped either

My merged dataset needs to have one PLC value for each date.

Do let me know if its still not clear.

Thanks,

Nimish

Valued Guide
Posts: 854

Re: Merging Issue

Try this:

proc sql;

create table want as

select distinct a.*,strip(b.plc) as plc

from have2 a full join

     have1 b on

a.date = b.date;

My current guess is that it's the leading spaces before plc, strip() will remove the spaces and if that's the case it will remove the perceived duplicates.

Esteemed Advisor
Posts: 6,309

Re: Merging Issue

When reading the raw files, make sure that date values are really stored as SAS dates to avoid confusion.

Also read the plc as numeric, and sort the DS2 by date and plc with the nodupkey option before merging.

Getting a SAS NOTE in the log about multiple by values in more than one input dataset when doing a merge is usually a signal that something in the logic has gone astray.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 11 replies
  • 309 views
  • 0 likes
  • 3 in conversation