how to combine 2 dataset by common variable with out replacing missing...

mrug12 · Posted 10-25-2011 01:09 PM

Here for example,

one dataset is

serial no start date

101 jan2008

101 feb2008

101 march 2008

.

102 feb 2007

102 march2007

102 april2007

.

secon dataset is

serial no end date

101 july2007

101 aug2007

101 sept2007

.

102 may 2008

102 june 2008

102 july2008

.

How can I get dataset like bellow from above 2 datasets by using common variable serial no

serial no start date end date

101 . july2007

101 . aug2007

101 . sept2007

.

101 jan2008 .

101 feb 2008 .

101 march .

.

102 feb2007 .

102 mar2007 .

102 april2007 .

.

102 . may2008

102 . june2008

102 . july2008

.

art297 · Posted 10-25-2011 01:24 PM

You didn't explain what missing data you don't want to modify, but I think you either want to use the UPDATE statement in a datastep or write your own rules using a MERGE of the files in a datastep. The update statement is described at: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001329151.htm

mrug12 · Posted 10-25-2011 01:33 PM

As in result dataset i do not want to modify missing data for start date and end date values as i want to keep both as per timeline and if value do not exit for that perticular month for any of them I want to keep it same way

Reeza · Posted 10-25-2011 01:35 PM

From the want data set it looks like you want a set statement rather than a merge.

Does the following do what you want?

data want;

set table1 table2;

run;

If you need to keep the missing in their position you may need to create another variable within serial no that allows you to sort by that.

HTH,

Reeza

mrug12 · Posted 10-25-2011 01:50 PM

As i mention i am creating dataset from 2 dataset using serial no as common variable and only values common in both dataset based on serial no will be kept .

so i can not use set satement .

art297 · Posted 10-25-2011 01:57 PM

It sounds like you simply want to do a merge of the two datasets. You do that either in a data step or, in proc sql, with a full join.

mrug12 · Posted 10-25-2011 02:58 PM

here i want to merge 2 dataset but with common varibale with matching values record only,,,,,and during this combination ,,I do not want to apply values to missing values for compared columns and want to retain values and apply missing value where value is not available.

Reeza · Posted 10-25-2011 02:16 PM

Do the periods under the serial no indicate missing serial no or a continuation of the previous serial no. Because otherwise all missing serial no's would be included according to your critieria.

Given that you're not overwriting or actually adding in a column it really sounds like you need an append or a set statement with a condition that the dataset contains only records in both files. This is different than a merge in my opinion.

You can try sql something like the following (untested code)

proc sql;

create table want as

select * from (select * from table1 where serial_no in (select serial_no from table2)

union all

select * from table2 where serial_no in (select serial_no from table1)) a;

quit;

If you actually have serial no missing that you need they may cause problems as I mentioned above and you'll have to identify them first somehow.

mrug12 · Posted 10-25-2011 02:53 PM

here period indicates continuation,,,,not missing serial no,,,,

and i think concept is right but ,,,,union all can not be use as columns in both data set is not same ,,,

and not able to get result through it,,

art297 · Posted 10-25-2011 02:57 PM

Do you have duplicate serial number, enddate combinations in either dataset?

mrug12 · Posted 10-25-2011 03:00 PM

yes serial no value is repeated sevral time as per ex. tables but enddate value is different for each one

Reeza · Posted 10-25-2011 03:11 PM

What about two steps then.

data want1;

set table1 (in=a) table2(in=b);

if a then source='a';

else if b then source='b';

run;

proc sql;

createt able want2

select * from want1

group by serial_no

where max(source) ne min(source); *check for both ids in there not just 1;

quit;

Again untested, but closer I think.

mrug12 · Posted 10-25-2011 03:53 PM

REEZA,

it did not work as it was using where and it not valid data step,,,,as per log error

Reeza · Posted 10-25-2011 04:32 PM

Show the code you used and the log error...

art297 · Posted 10-25-2011 03:12 PM

Does the following example match what you want to accomplish?:

data one;

informat start_date date9.;

format start_date date9.;

input serial_no start_date;

cards;

101 1jan2008

101 1feb2008

101 1mar2008

102 1feb2007

102 1mar2007

102 1apr2007

;

data two;

informat end_date date9.;

format end_date date9.;

input serial_no end_date;

cards;

101 1jul2007

101 1aug2007

101 1sep2007

102 1feb2007

102 1may2008

102 1jun2008

102 1jul2008

;

data twoprep;

set two;

start_date=end_date;

run;

data want;

merge one (in=a) twoprep (in=b);

by serial_no start_date;

if b and not(a) then

call missing(start_date);

run;

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

how to combine 2 dataset by common variable with out replacing missing values for other variables

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!