BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
iiibbb
Quartz | Level 8

I normally don't have trouble with merges, but I am usually doing it with simpler datasets.

I need to merge two datasets (DAILY_PAR_MRG and IPARF_MRG) to create 5,640 unique entries based on the PAR_periods.  The DAILY_PAR_MRG file is data from a meteorlogical station.  Depending on which measurement period (PAR_PERIOD) the day falls in, a different factor (IPARf) needs to be applied to the variables DAILY_PAR and DAILY_MJ.

I thought this would be staight-forward merge, but the successful dataset will have 5,640 lines; each CLONE (N=15) and REP (N=4) will have 94 values of DAILY_PAR and DAILY_MJ for the dates from June-29 to Sept-30 with the IPARf that corresponds to the PAR_PERIOD for that date. 

When I do a merge, I only get 600 lines.  I'm not sure what I'm missing.  Is this something for SQL?  I've never used that before.

proc sort data=IPARf_mrg;

    by PAR_Period;

run;

proc sort data=Daily_PAR_mrg;

    by PAR_Period;

run;

data cPAR;

    mergeDaily_PAR_mrg IPARf_mrg;

    by PAR_Period;

proc print;

run;

Thank you.

I don't know why the above is different than the simple case below.

data one;
  input id $ fruit $12.;
datalines;
a apple
a apricot
b banana
c coconut
c cherry
c crabapple
;

data two;
  input id $ color $;
datalines;
a amber
b brown
b black
c cocoa
c cream
;

data both;
  merge one two;
  by id;
run;

proc print data=both;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
iiibbb
Quartz | Level 8

I think I've done it in SQL.  Found a good tutorial at the UCLA website. 

Thanks for the looks.

proc sort data=work.IPARf_mrg;

    by PAR_Period;

run;

proc sort data=work.Daily_PAR_mrg;

    by PAR_Period;

run;

proc sql;

    create table work.cPAR_sql as

    select A.PAR_Period, A.Clone, A.Rep, A.IPARf,

            B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ

    From IPARf_mrg as A, Daily_PAR_mrg as B

    Where A.PAR_Period=B.PAR_Period;

quit;

proc sort data=work.cPAR_sql;

    by rep clone date;

run;

proc print data=work.cPAR_sql;

run;

View solution in original post

7 REPLIES 7
HB
Barite | Level 11 HB
Barite | Level 11

This part here:

data cPAR;

    mergeDaily_PAR_mrg IPARf_mrg;

    by PAR_Period;

proc print;

run;

is just a typo and is really:

data cPAR;

    merge Daily_PAR_mrg (in=a) IPARf_mrg (in=b);

    by PAR_Period;

    if a;

run;

proc print;

run;

right?

iiibbb
Quartz | Level 8

Sorry, that just generates the same 600 line dataset I got with my statement.

Thanks

iiibbb
Quartz | Level 8

I think I've done it in SQL.  Found a good tutorial at the UCLA website. 

Thanks for the looks.

proc sort data=work.IPARf_mrg;

    by PAR_Period;

run;

proc sort data=work.Daily_PAR_mrg;

    by PAR_Period;

run;

proc sql;

    create table work.cPAR_sql as

    select A.PAR_Period, A.Clone, A.Rep, A.IPARf,

            B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ

    From IPARf_mrg as A, Daily_PAR_mrg as B

    Where A.PAR_Period=B.PAR_Period;

quit;

proc sort data=work.cPAR_sql;

    by rep clone date;

run;

proc print data=work.cPAR_sql;

run;

AndersS
Lapis Lazuli | Level 10

The IF-statement is NOT quite correct. Se below:!

data cPAR;

    merge  Daily_PAR_mrg (in=a)    IPARf_mrg (in=b);

    by PAR_Period;

    if a or b;

run;

------------------------------

data both;
  merge one two;
  by id;
run;

This is the same as:
data both;
  merge one(in=in_one) two (in= in_two);
  by id;

  if in_one=1  or  in_two=1  then output;
   /*   if in_one or in_two;   */
run;

Anders Sköllermo (Skollermo in English)
Astounding
PROC Star

The nature of the difference is due to the many-to-many matching .  MERGE handles that differently than SQL.  Try running your final program with ONE, TWO, and BOTH and you'll see that the results are not necessarily what you expected especially when ID=C.

Good luck.

AndersS
Lapis Lazuli | Level 10

It would be good if you could explain more what you want to do: e.g. when PAR_Period= 2 you have a lot of values in DAILY_PAR_MRG  and a lot of values in PARF_MRG.

You can combine these by using SQL  - then for each PAR_Period value you get  #(DAILY_PAR_MRG) times #(PARF_MRG).

Or - you can use merge. Then for each PAR_Period value you get  MAX(#(DAILY_PAR_MRG), #(PARF_MRG))  - which is smaller than the value above.


Your text "Dependingon which measurement period (PAR_PERIOD) the day falls in, a different factor(IPARf) needs to be applied to the variables  " indicates that some information is missing in your description.

I think you need more  by-variables.
Br Anders

Anders Sköllermo (Skollermo in English)
Ksharp
Super User

MERGE is not suited for your situation. but SQL cartesian product .

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1549 views
  • 3 likes
  • 5 in conversation