Hi,
I'm trying to merge two datasets. The reference column lists the dates.
I've tried formatting the dates in both the columns i.e. to ensure that formats in both the data sets are the same. I've tried date9., mmddyy10., mmddyy.
When I look at the distinct dates, to cross check if the merging was correct, I see the following situation:
Date,PLC
06/27/2014,1
06/27/2014, 1
06/28/2014,1
06/28/2014, 1
This is just a sample of the output. I see a number of duplicate rows.
My code:
data data.hm1;
set data.hm1;
format date mmddyy10.;
run;
proc sort data=data.hm1;
by date;
run;
data data.S5_LTE;
set data.S5_LTE;
format date mmddyy10.;
run;
data data.hm1;
set data.hm1;
plc1 = put(plc,6.);
drop plc;
rename plc1 = plc;
run;
proc sort data=data.S5_LTE;
by date;
run;
data data.S5_LTE;
merge data.S5_LTE ( in=a) data.hm1(in=b) ;
by date ;
if a and b;
run;
Can you please suggest a way out?
Thanks,
Nimish
Can you provide a small sample of the data you have and what you want it to look like? I'm not sure what you are asking.
Hi,
I've attached two csv files for your reference. I'm trying to merge these files by using the date column for reference.
Regards,
Nimish
I did a little guess work here. Because I'm informatting the dates I don't know what your issue is. I'm assuming you have one date as a number and one as a character. If this is the case take a look at the new_date and this should help. Also I'm not sure if you want a full join or left join the way I have it.
Hope this helps:
data have1;
infile cards dsd;
informat date mmddyy10.;
format date mmddyy10.;
input date plc;
cards;
7/14/2014,1
7/21/2014,1
7/24/2014,1
7/28/2014,1
7/11/2014,1
7/14/2014,1
7/21/2014,1
7/28/2014,1
7/30/2014,1
7/8/2014,1
7/18/2014,1
7/28/2014,1
7/29/2014,1
7/1/2014,1
7/2/2014,1
7/3/2014,1
7/8/2014,1
7/11/2014,1
7/14/2014,1
;
run;
data have2;
infile cards dsd;
length modelname $30. date $10.;
input modelname $ date $ sales lsales;
cards;
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/21/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/24/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/11/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/21/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/30/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/8/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/18/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/28/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/29/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/1/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/2/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/3/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/8/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/11/2014,1,0
GALAXY_S5_BB_LTE-A*OFF*HM*010,7/14/2014,2,0.693147181
;
run;
data want_date;
format new_date mmddyy10.;
set have2;
new_date = input(put(date,$10.),mmddyy10.);
run;
proc sql;
create table want as
select a.*,b.plc
from want_date a full join
have1 b on
a.new_date = b.date;
hi,
so, i used the input suggestion. i get the following Warning 'Variable Date has already been defined as numeric'.
I also additionally see this statement in the log:
NOTE: Invalid argument to function INPUT at line 24 column 11.
Date=02/20/2013 N=1 Match=0 Holiday=0 Monday=0 Dum_Var=0 launch=0 plc=0 new_date=. _ERROR_=1 _N_=1
Do you know whats causing the issue?
Thanks,
Nimish
PS - you've understood the problem correctly.
If both of your date variables are numbers then you should be able to merge them, remove the piece of code with the input(put()) part and merge on date. The reason for the 'want_date' datastep would be to turn a character variable into a number field with a date format.
proc sql;
create table want as
select a.*,b.plc
from have2 a full join
have1 b on
a.date = b.date;
Hi,
That's original problem I was referring to.
When I didn't format the dates and tried merging them, it led to duplicate values (please refer to the example in my original post).
I tried formatting the dates etc. but it hasn't helped.
Can you think of any other solution?
Cheers,
Nimish
In your original post the attachment shows dates formatted the same so I can't see what the issue is,
if the only issue is the duplicate values add distinct to the join
proc sql;
create table want as
select distinct a.*,b.plc
from have2 a full join
have1 b on
a.date = b.date;
Okay. Let me spell out the issue again.
1) I need to merge two data sets by using the date as a reference
2) When I did merge them, I noticed duplicate entries (selected distinct date & plc using proc sql).
3) Ideally for each date, there should only be one PLC value. But when i looked into dataset, there were two PLC values assigned for each date. e.g:
Date,PLC
06/27/2014,1
06/27/2014, 1
06/28/2014,1
06/28/2014, 1
5) I've tried converting all the PLC values to text (&all numeric as well), but it hasn't helped.
6) I then tried formatting the dates, but it hasn't helped either
My merged dataset needs to have one PLC value for each date.
Do let me know if its still not clear.
Thanks,
Nimish
Try this:
proc sql;
create table want as
select distinct a.*,strip(b.plc) as plc
from have2 a full join
have1 b on
a.date = b.date;
My current guess is that it's the leading spaces before plc, strip() will remove the spaces and if that's the case it will remove the perceived duplicates.
When reading the raw files, make sure that date values are really stored as SAS dates to avoid confusion.
Also read the plc as numeric, and sort the DS2 by date and plc with the nodupkey option before merging.
Getting a SAS NOTE in the log about multiple by values in more than one input dataset when doing a merge is usually a signal that something in the logic has gone astray.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.