BookmarkSubscribeRSS Feed
mrug12
Calcite | Level 5

Here for example,

one dataset is

serial no  start date

101           jan2008

101          feb2008

101           march 2008

.

.

.

.

102            feb 2007

102            march2007

102            april2007

.

.

.

secon dataset is

serial no     end date

101            july2007

101            aug2007

101            sept2007

.

.

.

102           may 2008

102           june 2008

102           july2008

.

.

.

How can I get dataset like bellow from above 2 datasets by using common variable serial no

serial no    start date   end date

101               .             july2007

101               .             aug2007

101               .             sept2007

.

.

.

101             jan2008         .

101           feb 2008         .

101           march              .

.

.

.

102         feb2007             .

102         mar2007            .

102         april2007           .

.

.

.

102                .                may2008

102                .                june2008

102                .                july2008

.

.

.

20 REPLIES 20
art297
Opal | Level 21

You didn't explain what missing data you don't want to modify, but I think you either want to use the UPDATE statement in a datastep or write your own rules using a MERGE of the files in a datastep.  The update statement is described at: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001329151.htm

mrug12
Calcite | Level 5

As in result dataset i do not want to modify missing data for start date and end date values as i want to keep both as per timeline and if value do not exit for that perticular month for any of them I want to keep it same way

Reeza
Super User

From the want data set it looks like you want a set statement rather than a merge.

Does the following do what you want?

data want;

set table1 table2;

run;

If you need to keep the missing in their position you may need to create another variable within serial no that allows you to sort by that.

HTH,

Reeza

mrug12
Calcite | Level 5

As i mention i am creating dataset from 2 dataset  using serial no as common variable and only  values common in both dataset based on serial no will be kept .

so i can not use set satement .

art297
Opal | Level 21

It sounds like you simply want to do a merge of the two datasets.  You do that either in a data step or, in proc sql, with a full join.

mrug12
Calcite | Level 5

here i want to merge 2 dataset but with common varibale with matching values record only,,,,,and during this combination ,,I do not want to apply values to missing values for compared columns and want to retain values and apply missing value where value is not available.

Reeza
Super User

Do the periods under the serial no indicate missing serial no or a continuation of the previous serial no. Because otherwise all missing serial no's would be included according to your critieria.

Given that you're not overwriting or actually adding in a column it really sounds like you need an append or a set statement with a condition that the dataset contains only records in both files. This is different than a merge in my opinion.

You can try sql something like the following (untested code)

proc sql;

create table want as

select * from (select * from table1 where serial_no in (select serial_no from table2)

               union all

                    select * from table2 where serial_no in (select serial_no from table1)) a;

quit;

If you actually have serial no missing that you need they may cause problems as I mentioned above and you'll have to identify them first somehow.

mrug12
Calcite | Level 5

here period indicates continuation,,,,not missing serial no,,,,

and i think concept is right but ,,,,union all can not be use as columns in both data set is not same ,,,

and not able to get result through it,,

art297
Opal | Level 21

Do you have duplicate serial number, enddate combinations in either dataset?

mrug12
Calcite | Level 5

yes serial no value is repeated sevral time as per ex. tables but enddate value is different for each one

Reeza
Super User

What about two steps then.

data want1;

set table1 (in=a) table2(in=b);

if a then source='a';

else if b then source='b';

run;

proc sql;

createt able want2

select * from want1

group by serial_no

where max(source) ne min(source); *check for both ids in there not just 1;

quit;

Again untested, but closer I think.

mrug12
Calcite | Level 5

REEZA,

it did not work as it was using where and it not valid data step,,,,as per log error

art297
Opal | Level 21

Does the following example match what you want to accomplish?:

data one;

  informat start_date date9.;

  format start_date date9.;

  input serial_no  start_date;

  cards;

101 1jan2008

101 1feb2008

101 1mar2008

102 1feb2007

102 1mar2007

102 1apr2007

;

data two;

  informat end_date date9.;

  format end_date date9.;

  input serial_no  end_date;

  cards;

101 1jul2007

101 1aug2007

101 1sep2007

102 1feb2007

102 1may2008

102 1jun2008

102 1jul2008

;

data twoprep;

  set two;

  start_date=end_date;

run;

data want;

  merge one (in=a) twoprep (in=b);

  by serial_no start_date;

  if b and not(a) then

   call missing(start_date);

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1258 views
  • 6 likes
  • 5 in conversation