I normally don't have trouble with merges, but I am usually doing it with simpler datasets.
I need to merge two datasets (DAILY_PAR_MRG and IPARF_MRG) to create 5,640 unique entries based on the PAR_periods. The DAILY_PAR_MRG file is data from a meteorlogical station. Depending on which measurement period (PAR_PERIOD) the day falls in, a different factor (IPARf) needs to be applied to the variables DAILY_PAR and DAILY_MJ.
I thought this would be staight-forward merge, but the successful dataset will have 5,640 lines; each CLONE (N=15) and REP (N=4) will have 94 values of DAILY_PAR and DAILY_MJ for the dates from June-29 to Sept-30 with the IPARf that corresponds to the PAR_PERIOD for that date.
When I do a merge, I only get 600 lines. I'm not sure what I'm missing. Is this something for SQL? I've never used that before.
proc sort data=IPARf_mrg;
by PAR_Period;
run;
proc sort data=Daily_PAR_mrg;
by PAR_Period;
run;
data cPAR;
mergeDaily_PAR_mrg IPARf_mrg;
by PAR_Period;
proc print;
run;
Thank you.
I don't know why the above is different than the simple case below.
data one;
input id $ fruit $12.;
datalines;
a apple
a apricot
b banana
c coconut
c cherry
c crabapple
;
data two;
input id $ color $;
datalines;
a amber
b brown
b black
c cocoa
c cream
;
data both;
merge one two;
by id;
run;
proc print data=both;
run;
I think I've done it in SQL. Found a good tutorial at the UCLA website.
Thanks for the looks.
proc sort data=work.IPARf_mrg;
by PAR_Period;
run;
proc sort data=work.Daily_PAR_mrg;
by PAR_Period;
run;
proc sql;
create table work.cPAR_sql as
select A.PAR_Period, A.Clone, A.Rep, A.IPARf,
B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ
From IPARf_mrg as A, Daily_PAR_mrg as B
Where A.PAR_Period=B.PAR_Period;
quit;
proc sort data=work.cPAR_sql;
by rep clone date;
run;
proc print data=work.cPAR_sql;
run;
This part here:
data cPAR;
mergeDaily_PAR_mrg IPARf_mrg;
by PAR_Period;
proc print;
run;
is just a typo and is really:
data cPAR;
merge Daily_PAR_mrg (in=a) IPARf_mrg (in=b);
by PAR_Period;
if a;
run;
proc print;
run;
right?
Sorry, that just generates the same 600 line dataset I got with my statement.
Thanks
I think I've done it in SQL. Found a good tutorial at the UCLA website.
Thanks for the looks.
proc sort data=work.IPARf_mrg;
by PAR_Period;
run;
proc sort data=work.Daily_PAR_mrg;
by PAR_Period;
run;
proc sql;
create table work.cPAR_sql as
select A.PAR_Period, A.Clone, A.Rep, A.IPARf,
B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ
From IPARf_mrg as A, Daily_PAR_mrg as B
Where A.PAR_Period=B.PAR_Period;
quit;
proc sort data=work.cPAR_sql;
by rep clone date;
run;
proc print data=work.cPAR_sql;
run;
The IF-statement is NOT quite correct. Se below:!
data cPAR;
merge Daily_PAR_mrg (in=a) IPARf_mrg (in=b);
by PAR_Period;
if a or b;
run;
------------------------------
data both;
merge one two;
by id;
run;
This is the same as:data both;
merge one(in=in_one) two (in= in_two);
by id;
if in_one=1 or in_two=1 then output;
/* if in_one or in_two; */
run;
The nature of the difference is due to the many-to-many matching . MERGE handles that differently than SQL. Try running your final program with ONE, TWO, and BOTH and you'll see that the results are not necessarily what you expected especially when ID=C.
Good luck.
It would be good if you could explain more what you want to do: e.g. when PAR_Period= 2 you have a lot of values in DAILY_PAR_MRG and a lot of values in PARF_MRG.
You can combine these by using SQL - then for each PAR_Period value you get #(DAILY_PAR_MRG) times #(PARF_MRG).
Or - you can use merge. Then for each PAR_Period value you get MAX(#(DAILY_PAR_MRG), #(PARF_MRG)) - which is smaller than the value above.
Your text "Dependingon which measurement period (PAR_PERIOD) the day falls in, a different factor(IPARf) needs to be applied to the variables " indicates that some information is missing in your description.
I think you need more by-variables.
Br Anders
MERGE is not suited for your situation. but SQL cartesian product .
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.