Merging & Manipulating datasets

Reply
New Contributor
Posts: 3

Merging & Manipulating datasets

Hi all! I'm new to SAS (using UE) and am trying to take two datasets, merge them, and then create a new column of calculations. I have one data set (dist) that has 1 row/observation of data (the mean, percentile, and cutprobs) and another data set (brr_runs) that holds 16 rows/observations with nearly identical variables (mean, percentiles, and cutprobs but with "b" in the title to separate them from the initial run). I'd like to merge them and then calculate the squared difference between each estimate in the brr_runs set and the baseline parameters (located in dist). This is what I'm attempting:

 

data distall;
merge dist brr_runs;
array varo (*) mean_mc_t tpercentile1-tpercentile99 cutprob1-cutprob4;
array bvar (*) bmean_mc_t btpercentile1-btpercentile99 bcutprob1-bcutprob4;
array dsqr (*) dbmean_mc_t dbtpercentile1-dbtpercentile99 dbcutprob1-dbcutprob4;
do i=1 to dim(bvar);
dsqr[i]=(bvar[i]-varo[i])**2;
end;
run;

 

However, when I run this it creates a data set "distall" that has mostly missing values. I've tried manipulating it, but have run into the same issue. It also has created "distall" with 17 rows of identical data (as in, it just took one observation from the original datasets and recreated it 16 times).

 

Can anyone give me some guidance on this? Thanks!

Super User
Posts: 10,508

Re: Merging & Manipulating datasets

Example data similar to both starting sets and of the desired output for those sets helps provide better examples.

 

MERGE has several behaviors depending on whether you expect to match rows based on a common variable (or group of variables). Is that the case in your data? That would require the use of a BY statement with the variable to match. The data would also have to be sorted prior to the merge by the same variables.

 

Without BY merge tells SAS to put the first row of one set with the first of the other, 2nd to 2nd rows, 3rd to 3rd. If one set has fewer rows then there is no contributed data and hence missing.

 

MERGE seldom does what you want in the case of a one-to-many or many-to-many combination.

If you want to match ALL rows in set 1 with ALL rows in set 2 then what you are seeking is join (another word for combine but with a variety of types) and would be accomplished with Proc SQL. If that is what you want you might consider:

proc sql;
   create table distall as
   select dist.*, brr_runs.*
   from dist, brr_runs;
quit;
/* since arrays do not work in SQL it may be easier to
add the manipulation as*/
data distall;
   set distall;
   array varo (*) mean_mc_t tpercentile1-tpercentile99 cutprob1-cutprob4;
   array bvar (*) bmean_mc_t btpercentile1-btpercentile99 bcutprob1-bcutprob4;
   array dsqr (*) dbmean_mc_t dbtpercentile1-dbtpercentile99 dbcutprob1-dbcutprob4;
   do i=1 to dim(bvar);
      dsqr[i]=(bvar[i]-varo[i])**2;
   end;
run;

If that doesn't do what you need, see my comment about example data.

 

PROC Star
Posts: 7,363

Re: Merging & Manipulating datasets

You can do it all in one data step:

data distall;
  array varo (*) mean_mc_t tpercentile1-tpercentile99 cutprob1-cutprob4;
  array bvar (*) bmean_mc_t btpercentile1-btpercentile99 bcutprob1-bcutprob4;
  array dsqr (*) dbmean_mc_t dbtpercentile1-dbtpercentile99 dbcutprob1-dbcutprob4;
  set brr_runs;
  if _n_ eq 1 then set dist;
  do i=1 to dim(bvar);
    dsqr[i]=(bvar[i]-varo[i])**2;
  end;
run;

Art, CEO, AnalystFinder.com

Ask a Question
Discussion stats
  • 2 replies
  • 93 views
  • 0 likes
  • 3 in conversation