Help using Base SAS procedures

Big Merge Help

Accepted Solution Solved
Reply
Contributor
Posts: 41
Accepted Solution

Big Merge Help

I normally don't have trouble with merges, but I am usually doing it with simpler datasets.

I need to merge two datasets (DAILY_PAR_MRG and IPARF_MRG) to create 5,640 unique entries based on the PAR_periods.  The DAILY_PAR_MRG file is data from a meteorlogical station.  Depending on which measurement period (PAR_PERIOD) the day falls in, a different factor (IPARf) needs to be applied to the variables DAILY_PAR and DAILY_MJ.

I thought this would be staight-forward merge, but the successful dataset will have 5,640 lines; each CLONE (N=15) and REP (N=4) will have 94 values of DAILY_PAR and DAILY_MJ for the dates from June-29 to Sept-30 with the IPARf that corresponds to the PAR_PERIOD for that date. 

When I do a merge, I only get 600 lines.  I'm not sure what I'm missing.  Is this something for SQL?  I've never used that before.

proc sort data=IPARf_mrg;

    by PAR_Period;

run;

proc sort data=Daily_PAR_mrg;

    by PAR_Period;

run;

data cPAR;

    mergeDaily_PAR_mrg IPARf_mrg;

    by PAR_Period;

proc print;

run;

Thank you.

I don't know why the above is different than the simple case below.

data one;
  input id $ fruit $12.;
datalines;
a apple
a apricot
b banana
c coconut
c cherry
c crabapple
;

data two;
  input id $ color $;
datalines;
a amber
b brown
b black
c cocoa
c cream
;

data both;
  merge one two;
  by id;
run;

proc print data=both;
run;

Attachment
Attachment
Attachment

Accepted Solutions
Solution
‎03-13-2012 01:17 PM
Contributor
Posts: 41

Big Merge Help

I think I've done it in SQL.  Found a good tutorial at the UCLA website. 

Thanks for the looks.

proc sort data=work.IPARf_mrg;

    by PAR_Period;

run;

proc sort data=work.Daily_PAR_mrg;

    by PAR_Period;

run;

proc sql;

    create table work.cPAR_sql as

    select A.PAR_Period, A.Clone, A.Rep, A.IPARf,

            B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ

    From IPARf_mrg as A, Daily_PAR_mrg as B

    Where A.PAR_Period=B.PAR_Period;

quit;

proc sort data=work.cPAR_sql;

    by rep clone date;

run;

proc print data=work.cPAR_sql;

run;

View solution in original post


All Replies
Frequent Contributor
Frequent Contributor
Posts: 89

Big Merge Help

This part here:

data cPAR;

    mergeDaily_PAR_mrg IPARf_mrg;

    by PAR_Period;

proc print;

run;

is just a typo and is really:

data cPAR;

    merge Daily_PAR_mrg (in=a) IPARf_mrg (in=b);

    by PAR_Period;

    if a;

run;

proc print;

run;

right?

Contributor
Posts: 41

Big Merge Help

Sorry, that just generates the same 600 line dataset I got with my statement.

Thanks

Solution
‎03-13-2012 01:17 PM
Contributor
Posts: 41

Big Merge Help

I think I've done it in SQL.  Found a good tutorial at the UCLA website. 

Thanks for the looks.

proc sort data=work.IPARf_mrg;

    by PAR_Period;

run;

proc sort data=work.Daily_PAR_mrg;

    by PAR_Period;

run;

proc sql;

    create table work.cPAR_sql as

    select A.PAR_Period, A.Clone, A.Rep, A.IPARf,

            B.LUE_Period, B.Date, B.Daily_PAR, B.Daily_MJ

    From IPARf_mrg as A, Daily_PAR_mrg as B

    Where A.PAR_Period=B.PAR_Period;

quit;

proc sort data=work.cPAR_sql;

    by rep clone date;

run;

proc print data=work.cPAR_sql;

run;

Occasional Contributor
Posts: 19

Big Merge Help

The IF-statement is NOT quite correct. Se below:!

data cPAR;

    merge  Daily_PAR_mrg (in=a)    IPARf_mrg (in=b);

    by PAR_Period;

    if a or b;

run;

------------------------------

data both;
  merge one two;
  by id;
run;

This is the same as:
data both;
  merge one(in=in_one) two (in= in_two);
  by id;

  if in_one=1  or  in_two=1  then output;
   /*   if in_one or in_two;   */
run;

Anders Sköllermo
Ph.D., Swedish Actuary

"Both Age and IQ are 69+"
"Retired, But Not Tired"

Sandgränd 13, S-178 40 Ekerö
email: anders.skollermo@one.se
tele: 00468968419 mobil: 0046735077373
Super User
Posts: 5,081

Big Merge Help

The nature of the difference is due to the many-to-many matching .  MERGE handles that differently than SQL.  Try running your final program with ONE, TWO, and BOTH and you'll see that the results are not necessarily what you expected especially when ID=C.

Good luck.

Occasional Contributor
Posts: 19

Big Merge Help

It would be good if you could explain more what you want to do: e.g. when PAR_Period= 2 you have a lot of values in DAILY_PAR_MRG  and a lot of values in PARF_MRG.

You can combine these by using SQL  - then for each PAR_Period value you get  #(DAILY_PAR_MRG) times #(PARF_MRG).

Or - you can use merge. Then for each PAR_Period value you get  MAX(#(DAILY_PAR_MRG), #(PARF_MRG))  - which is smaller than the value above.


Your text "Dependingon which measurement period (PAR_PERIOD) the day falls in, a different factor(IPARf) needs to be applied to the variables  " indicates that some information is missing in your description.

I think you need more  by-variables.
Br Anders

Anders Sköllermo
Ph.D., Swedish Actuary

"Both Age and IQ are 69+"
"Retired, But Not Tired"

Sandgränd 13, S-178 40 Ekerö
email: anders.skollermo@one.se
tele: 00468968419 mobil: 0046735077373
Super User
Posts: 9,676

Big Merge Help

MERGE is not suited for your situation. but SQL cartesian product .

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 237 views
  • 3 likes
  • 5 in conversation