BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
iiibbb
Quartz | Level 8

I normally don't have trouble with merges, but I am usually doing it with simpler datasets.

I need to merge two datasets (DAILY_PAR_MRG and IPARF_MRG) to create 5,640 unique entries based on the PAR_periods.  The DAILY_PAR_MRG file is data from a meteorlogical station.  Depending on which measurement period (PAR_PERIOD) the day falls in, a different factor (IPARf) needs to be applied to the variables DAILY_PAR and DAILY_MJ.

I thought this would be staight-forward merge, but the successful dataset will have 5,640 lines; each CLONE (N=15) and REP (N=4) will have 94 values of DAILY_PAR and DAILY_MJ for the dates from June-29 to Sept-30 with the IPARf that corresponds to the PAR_PERIOD for that date. 

When I do a merge, I only get 600 lines.  I'm not sure what I'm missing.  Is this something for SQL?  I've never used that before.

proc sort data=IPARf_mrg;

    by PAR_Period;

run;

proc sort data=Daily_PAR_mrg;

    by PAR_Period;

run;

data cPAR;

    mergeDaily_PAR_mrg IPARf_mrg;

    by PAR_Period;

proc print;

run;

Thank you.

I don't know why the above is different than the simple case below.

data one;
  input id $ fruit $12.;
datalines;
a apple
a apricot
b banana
c coconut
c cherry
c crabapple
;

data two;
  input id $ color $;
datalines;
a amber
b brown
b black
c cocoa
c cream
;

data both;
  merge one two;
  by id;
run;

proc print data=both;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
iiibbb
Quartz | Level 8

I think I've done it in SQL.  Found a good tutorial at the UCLA website. 

Thanks for the looks.

proc sort data=work.IPARf_mrg;

    by PAR_Period;

run;

proc sort data=work.Daily_PAR_mrg;

    by PAR_Period;

run;

proc sql;

    create table work.cPAR_sql as

    select A.PAR_Period, A.Clone, A.Rep, A.IPARf,

            B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ

    From IPARf_mrg as A, Daily_PAR_mrg as B

    Where A.PAR_Period=B.PAR_Period;

quit;

proc sort data=work.cPAR_sql;

    by rep clone date;

run;

proc print data=work.cPAR_sql;

run;

View solution in original post

7 REPLIES 7
HB
Barite | Level 11 HB
Barite | Level 11

This part here:

data cPAR;

    mergeDaily_PAR_mrg IPARf_mrg;

    by PAR_Period;

proc print;

run;

is just a typo and is really:

data cPAR;

    merge Daily_PAR_mrg (in=a) IPARf_mrg (in=b);

    by PAR_Period;

    if a;

run;

proc print;

run;

right?

iiibbb
Quartz | Level 8

Sorry, that just generates the same 600 line dataset I got with my statement.

Thanks

iiibbb
Quartz | Level 8

I think I've done it in SQL.  Found a good tutorial at the UCLA website. 

Thanks for the looks.

proc sort data=work.IPARf_mrg;

    by PAR_Period;

run;

proc sort data=work.Daily_PAR_mrg;

    by PAR_Period;

run;

proc sql;

    create table work.cPAR_sql as

    select A.PAR_Period, A.Clone, A.Rep, A.IPARf,

            B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ

    From IPARf_mrg as A, Daily_PAR_mrg as B

    Where A.PAR_Period=B.PAR_Period;

quit;

proc sort data=work.cPAR_sql;

    by rep clone date;

run;

proc print data=work.cPAR_sql;

run;

AndersS
Pyrite | Level 9

The IF-statement is NOT quite correct. Se below:!

data cPAR;

    merge  Daily_PAR_mrg (in=a)    IPARf_mrg (in=b);

    by PAR_Period;

    if a or b;

run;

------------------------------

data both;
  merge one two;
  by id;
run;

This is the same as:
data both;
  merge one(in=in_one) two (in= in_two);
  by id;

  if in_one=1  or  in_two=1  then output;
   /*   if in_one or in_two;   */
run;

Anders Sköllermo (Skollermo in English)
Astounding
PROC Star

The nature of the difference is due to the many-to-many matching .  MERGE handles that differently than SQL.  Try running your final program with ONE, TWO, and BOTH and you'll see that the results are not necessarily what you expected especially when ID=C.

Good luck.

AndersS
Pyrite | Level 9

It would be good if you could explain more what you want to do: e.g. when PAR_Period= 2 you have a lot of values in DAILY_PAR_MRG  and a lot of values in PARF_MRG.

You can combine these by using SQL  - then for each PAR_Period value you get  #(DAILY_PAR_MRG) times #(PARF_MRG).

Or - you can use merge. Then for each PAR_Period value you get  MAX(#(DAILY_PAR_MRG), #(PARF_MRG))  - which is smaller than the value above.


Your text "Dependingon which measurement period (PAR_PERIOD) the day falls in, a different factor(IPARf) needs to be applied to the variables  " indicates that some information is missing in your description.

I think you need more  by-variables.
Br Anders

Anders Sköllermo (Skollermo in English)
Ksharp
Super User

MERGE is not suited for your situation. but SQL cartesian product .

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1035 views
  • 3 likes
  • 5 in conversation