Re: Merge of two dataset with keeping the orders

Hoz · Posted 08-04-2015 05:53 AM

Hi,

I have following two datasets.

dataset1:

[neme_value]

name1

name2

name3

...

dataset2:

[name_value]

name1

namea

nameb

name3

namee

...

I'd like to require following dataset.

resultdataset:

[name_value]

name1

name2

namea

nameb

name3

namee

...

I can't get it by this code.

data resultdataset;

merge dataset1 dataset2;

by notsorted name_value;

run;

Please tell me the way to get that result.

Thank you,

Hide

RW9 · Posted 08-04-2015 06:25 AM

My first question would be why? Logically speaking there is no validity in a a random order like that. You would be better off working out a good and reproducible output rather than something like this, if the first dataset is changed, then you may find the output dataset is no longer like a previous version. If you still want to go ahead, then assign a variable to _n_ in both instances:

data inter1;

set dataset1;

nvar=_n_;

run;

data inter2;

set dataset2;

nvar=_n_;

run;

data want;

merge inter1 inter2;

by name_value;

run;

proc sort data=want;

by nvar name_value;

run;

But then again, why. To use the data in any meaningful way later on you will have to sort it anyways.

Kurt_Bremser · Posted 08-04-2015 07:10 AM

What is the logic for where to place name2 in the output dataset?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Loko · Posted 08-04-2015 08:29 AM

Hello,

data have1;
input name_value $5.;
datalines;
name1
name2
name3
;

data have2;
input name_value $5.;
datalines;
name1
namea
nameb
name3
namee
;

proc sort data=have1 presorted;
by name_value;
run;

proc sort data=have2 presorted;
by name_value;
run;

data want;
update have2 have1;
by name_value;
run;

Hoz · Posted 08-04-2015 09:21 PM

Dear all,

Thank you for reply and sorry to take you to ambiguous question.

Specifically speaking,I want to know the difference of valiables of two datasets,
these datasets have almost same valiables.Additionally, valiable's parts are grouping semantically,

so result dataset needs to keep orders.

First I got these detasets included valiable name of each dataset by proc contents. And then,want to compare
these datasets...

ex)
dataset1:
[name]
id
fname
lname
maintest1
maintest2
subtest1
subtest2
date1

dataset2:
[name]
id
id_sub
fname
lname
maintest1
maintest2
maintest2_2
subtest1
date1
date1_sub

want dataset:
[inDS1 inDS2 valname]
1 1 id
0 1 id_sub
1 1 fname
1 1 lname
1 1 maintest1
1 1 maintest2
0 1 maintest2_2
1 1 subtest1
1 0 subtest2
1 1 date1
0 1 date1_sub

Reeza · Posted 08-04-2015 09:34 PM

I don't see how order matters.

And you can use SASHELP.VCOLUMNS to simplify your query.

Here's a macro I wrote to solve the problem a while back but you can easily remove the macro portion if you don't need it:

SAS_compare_dataset_variables · GitHub

%macro compare_data(base=, compare=);

proc sql noprint;

create table base_in as

select name

from sashelp.vcolumn

where libname=upper(scan("&base", 1, "."))

and memname=upper(scan("&base", 2, "."))

order by varnum;

quit;

proc sql noprint;

create table compare_in as

select name

from sashelp.vcolumn

where libname=upper(scan("&compare", 1, "."))

and memname=upper(scan("&compare", 2, "."))

order by varnum;

quit;

proc sql;

create table comparison as

select a.name as base_var, b.name as compare_var,

case when missing(a.name) then catx("-", "Comparison dataset has extra variable", b.name)

when missing(b.name) then catx("-", "Comparison dataset is missing variable", a.name)

when a.name=b.name then "Variable in both datasets"

else "CHECKME"

end as comparison

from base_in as a

full join compare_in as b

on a.name=b.name;

quit;

/* data _null_; */

/* set comparison; */

/* if comparison not eq "Variable in both datasets" */

/* then put "ERROR:" comparison; */

/* run; */

%mend;

/*example of calling macro*/

data class;

set sashelp.class;

bmi=(weight/(height**2))*703;

run;

data class2;

set sashelp.class;

weight_kgs=weight*0.453592;

height_cm=height*2.54;

run;

%compare_data(base=work.class2, compare=work.class);

ChrisNZ · Posted 08-04-2015 11:11 PM

If you really need this, start the data step by reading the dataset structures in the wanted order.

data WANT;

if 0 then set DS2(keep= id id_sub fname lname maintest1 maintest2 maintest2_2 subtest1);

if 0 then set DS1(keep= subtest2);

if 0 then set DS2(keep= date1 date1_sub);

merge DS1 DS2;

...

run;

High-Performance SAS Coding - Third Edition

RW9 · Posted 08-05-2015 04:31 AM

Oh, then that is very simple:

proc sql;

create table WANT as

select A.VARNUM as ORDER_IN_BASE,

B.VARNUM as ORDER_IN_COMPARE,

COALESCE(A.NAME,B.NAME) as NAME

from (select * from SASHELP.VCOLUMN where LIBNAME="WORK" and MEMNAME="DATASET1") A

full join (select * from SASHELP.VCOLUMN where LIBNAME="WORK" and MEMNAME="DATASET2") B

on A.NAME=B.NAME

order by COALESCE(A.NAME,B.NAME) /* note I have order by name here, you could do any combination of name and orders */;

quit;

Hoz · Posted 08-05-2015 05:04 AM

Thank you for answer.

But i posted this because it's difficult to do with keeping varnum order.

If it exists the same valiable name in both dataset, output this valiable name by varnum order.

If it exists the valiable name in one side dataset, output the valiable name between neighborhood same valiable name with keeping varnum order.

RW9 · Posted 08-05-2015 05:18 AM

Yes, but that is because the order you are thinking of is not a *logical* order. Say you have:

1 abc

2 def

3 efg

and

1 efg

2 abc

What order should efg appear at? What about abc, it is order 2 in the second example, but 1 in the first, the two merge but don't have the same order. What I would suggest is that you think of the logical order for display, something you can put in a proc sort. Alternatively have a look at the specifications, that should be your base, i.e. if the specs state: 1=abc, 2=def... then you can code that in and then merge the two sets of variables to that as you then have a predefined list. If you haven't got specs, well...

SAS Innovate 2025: Call for Content

Classroom Training Available!