SAS Programming

deleted_user · Posted 10-06-2008 02:52 PM

Hi, I am trying to combine two data sets with the same variables conditionally. I only want to combine observations from the second data set that have the same value as a variablex (famtrac below) in the primary data set. Variablex has the same name & same format in both data sets. I tried using the set statement with in= and an if statement, but that didn't work:
data all65plus08;
set selena.cps65plus08a (in=eldfam) selena.cps65under08;
by famtrac;
if eldfam;
run;

There are duplicate observations in the secondary data set that meet the conditions I specified and I *do* want these duplicates to be added to the new combined data set.

Any help would be appreciated!

sbb · Posted 10-06-2008 03:16 PM

You are using SET (interleave observations) with a BY. Do you mean to use a MERGE with a BY, in order to "combine" your two files? Do consider the impact of the MERGE operation with same-named variables in the two (or more) files on the MERGE statement. Have a look at the SAS support website http://support.sas.com/ and the Product Documentation for details about the Data step's MERGE statement processing.

Scott Barry
SBBWorks, Inc.

deleted_user · Posted 10-06-2008 03:34 PM

I don't want to merge the data sets, I need to interleave them (the end result would be a data set with the observations from the primary data set plus all observations from the secondary data set that have a value for famtrac that occurs for some observation in the primary data set).

I want to interleave observations, but I want to specify which observations from the secondary data set will be included in the interleaved data set...

sbb · Posted 10-06-2008 05:01 PM

Then you need to add an IN= for the second dataset on the SET (not the first one), and use IF/THEN logic to conditionally output obs contributed from the second dataset.

Scott Barry
SBBWorks, Inc.

sbb · Posted 10-06-2008 05:03 PM

And it appears that you will also need to keep track of the prior first DSN contribution, using a RETAIN and assigning a temporary (different named) variable.

Scott Barry
SBBWorks, Inc.

Cynthia_sas · Posted 10-06-2008 05:36 PM

Hi:
You are on the right track for interleaving, however, your IN= variable and your subsetting IF are limiting the new dataset to only those observations that come from selena.cps65plus08a -- which doesn't sound like what you want.

You might need to put an IN= on the secondary dataset as well, and then tailor your IF statement accordingly. If I understand you correctly, you want EVERY obs from the first dataset, but only SOME obs from the second dataset??

Consider this data (Name, Present_type, Amtspend are my variables. I'm also making a variable called FROMFILE so that the final output shows which file contributed the interleaved observation):
[pre]
data one;
length name present_type $12;
infile datalines;
input name $ present_type $ amtspend;
fromfile = 'one';
return;
datalines;
alan anniversary 15
alan birthday 10
carl anniversary 15
edna birthday 20
;
run;

data two;
length name present_type $12;
infile datalines;
input name $ present_type $ amtspend;
fromfile = 'two';
return;
datalines;
alan christmas 25
bob anniversary 10
bob birthday 15
carl birthday 10
edna christmas 15
;
run;
[/pre]

For this data, I want ALL the obs from dataset ONE and everybody from dataset TWO except poor Carl -- he only gets an anniversary present. So this is the output I want:
[pre]
present_
Obs name type amtspend fromfile

1 alan anniversary 15 one
2 alan birthday 10 one
3 alan christmas 25 two
4 bob anniversary 10 two
5 bob birthday 15 two
6 carl anniversary 15 one
7 edna birthday 20 one
8 edna christmas 15 two

[/pre]

Created by THIS program:
[pre]
data three;
set one(in=inone)
two(in=intwo);
by name;
if inone or
(intwo and name ne 'carl') then output;
run;

ods listing;
proc print data=three;
title 'three';
run;
title;
[/pre]

If you have more complex conditions, then you'll have to code for those as well. This was just a simple test. But it should give you an idea of how interleaving works. Remember that the BY in interleaving performs differently than the BY in merging.

To learn more about the interleaving process, I recommend this site,
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001318366.htm
which has a section entitled: Understanding the Interleaving Process.

cynthia

deleted_user · Posted 10-08-2008 02:38 PM

Thanks for the help Cynthia. Yes, I want only some obs from the second dataset, but the obs that I want do not have a consistent value for the variable I am selecting on. I only want the obs from the second dataset where variablex (famtrac) in the second dataset matches any value of variablex (famtrac) in the first dataset...

In case anyone is also trying to figure out how to do this, here's what I finally came up with:

proc sql;
create table sql.fam6508
like sql.cps65plus08a;
proc sql;
insert into sql.fam6508
select * from sql.cps65plus08a **the primary data set**
union all
select * from sql.cps65under08 **secondary data set**
where famtrac in
(select famtrac from sql.cps65plus08a);

deleted_user · Posted 11-18-2008 05:07 PM

An (untested) alternative is the following, assuming the input data sets are sorted on famtrac:

data all65plus08 (drop=v1);
set selena.cps65plus08a (in=eldfam) selena.cps65under08 (in=b);
by famtrac;
retain v1;
if eldfam then v1=famtrac;
if eldfam or (b and v1=famtrac);
run;

which should give the same result, except for the sort order in the result data set.

SAS Programming

combining two data sets conditionally

Re: combining two data sets conditionally

Re: combining two data sets conditionally

Re: combining two data sets conditionally

Re: combining two data sets conditionally

Re: combining two data sets conditionally

Re: combining two data sets conditionally

Re: combining two data sets conditionally

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...