BookmarkSubscribeRSS Feed
Nimish_Vaddiparti
Calcite | Level 5

Hi,

I'm trying to merge two datasets. The reference column lists the dates.

I've tried formatting the dates in both the columns i.e. to ensure that formats in both the data sets are the same. I've tried date9., mmddyy10., mmddyy.

When I look at the distinct dates, to cross check if the merging was correct, I see the following situation:

Date,PLC

06/27/2014,1

06/27/2014,    1

06/28/2014,1

06/28/2014,    1


This is just a sample of the output. I see a number of duplicate rows.

My code:

data data.hm1;

set data.hm1;

format date mmddyy10.;

run;

proc sort data=data.hm1;

by date;

run;

data data.S5_LTE;

set data.S5_LTE;

format date mmddyy10.;

run;


data data.hm1;

set data.hm1;

plc1 = put(plc,6.);

drop plc;

rename plc1 = plc;

run;

proc sort data=data.S5_LTE;

by date;

run;

data data.S5_LTE;

merge data.S5_LTE ( in=a)  data.hm1(in=b) ;

by date ;

if a and b;

run;

Can you please suggest a way out?

Thanks,

Nimish

11 REPLIES 11
Steelers_In_DC
Barite | Level 11

Can you provide a small sample of the data you have and what you want it to look like?  I'm not sure what you are asking.

Nimish_Vaddiparti
Calcite | Level 5

Hi,

I've attached two csv files for your reference. I'm trying to merge these files by using the date column for reference.

Regards,

Nimish

Steelers_In_DC
Barite | Level 11

I did a little guess work here.  Because I'm informatting the dates I don't know what your issue is.  I'm assuming you have one date as a number and one as a character.  If this is the case take a look at the new_date and this should help.  Also I'm not sure if you want a full join or left join the way I have it.

Hope this helps:

data have1;

infile cards dsd;

informat date mmddyy10.;

format date mmddyy10.;

input date plc;

cards;

7/14/2014,1

7/21/2014,1

7/24/2014,1

7/28/2014,1

7/11/2014,1

7/14/2014,1

7/21/2014,1

7/28/2014,1

7/30/2014,1

7/8/2014,1

7/18/2014,1

7/28/2014,1

7/29/2014,1

7/1/2014,1

7/2/2014,1

7/3/2014,1

7/8/2014,1

7/11/2014,1

7/14/2014,1

;

run;

data have2;

infile cards dsd;

length modelname $30. date $10.;

input modelname $ date $ sales lsales;

cards;

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/21/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/24/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/11/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/21/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/30/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/8/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/18/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/29/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/1/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/2/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/3/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/8/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/11/2014,1,0

GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181

;

run;

data want_date;

format new_date mmddyy10.;

set have2;

new_date = input(put(date,$10.),mmddyy10.);

run;

proc sql;

create table want as

select a.*,b.plc

from want_date a full join

     have1 b on

a.new_date = b.date;

Nimish_Vaddiparti
Calcite | Level 5

hi,

so, i used the input suggestion. i get the following Warning 'Variable Date has already been defined as numeric'.

I also additionally see this statement in the log:

NOTE: Invalid argument to function INPUT at line 24 column 11.

Date=02/20/2013 N=1 Match=0 Holiday=0 Monday=0 Dum_Var=0 launch=0 plc=0 new_date=. _ERROR_=1 _N_=1

Do you know whats causing the issue?

Thanks,

Nimish

Nimish_Vaddiparti
Calcite | Level 5

PS - you've understood the problem correctly.

Steelers_In_DC
Barite | Level 11

If both of your date variables are numbers then you should be able to merge them, remove the piece of code with the input(put()) part and merge on date.  The reason for the 'want_date' datastep would be to turn a character variable into a number field with a date format.

proc sql;

create table want as

select a.*,b.plc

from have2 a full join

     have1 b on

a.date = b.date;

Nimish_Vaddiparti
Calcite | Level 5

Hi,

That's original problem I was referring to.

When I didn't format the dates and tried merging them, it led to duplicate values (please refer to the example in my original post).

I tried formatting the dates etc. but it hasn't helped.

Can you think of any other solution?

Cheers,

Nimish

Steelers_In_DC
Barite | Level 11

In your original post the attachment shows dates formatted the same so I can't see what the issue is,

if the only issue is the duplicate values add distinct to the join

proc sql;

create table want as

select distinct a.*,b.plc

from have2 a full join

     have1 b on

a.date = b.date;

Nimish_Vaddiparti
Calcite | Level 5

Okay. Let me spell out the issue again.

1) I need to merge two data sets by using the date as a reference

2) When I did merge them, I noticed duplicate entries (selected distinct date & plc using proc sql).

3) Ideally for each date, there should only be one PLC value. But when i looked into dataset, there were two PLC values assigned for each date. e.g:

Date,PLC

06/27/2014,1

06/27/2014,    1

06/28/2014,1

06/28/2014,    1

5) I've tried converting all the PLC values to text (&all numeric as well), but it hasn't helped.

6) I then tried formatting the dates, but it hasn't helped either

My merged dataset needs to have one PLC value for each date.

Do let me know if its still not clear.

Thanks,

Nimish

Steelers_In_DC
Barite | Level 11

Try this:

proc sql;

create table want as

select distinct a.*,strip(b.plc) as plc

from have2 a full join

     have1 b on

a.date = b.date;

My current guess is that it's the leading spaces before plc, strip() will remove the spaces and if that's the case it will remove the perceived duplicates.

Kurt_Bremser
Super User

When reading the raw files, make sure that date values are really stored as SAS dates to avoid confusion.

Also read the plc as numeric, and sort the DS2 by date and plc with the nodupkey option before merging.

Getting a SAS NOTE in the log about multiple by values in more than one input dataset when doing a merge is usually a signal that something in the logic has gone astray.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1290 views
  • 0 likes
  • 3 in conversation