BookmarkSubscribeRSS Feed
Hoz
Calcite | Level 5 Hoz
Calcite | Level 5

Hi,

I have following two datasets.

dataset1:

[neme_value]

name1

name2

name3

...

dataset2:

[name_value]

name1

namea

nameb

name3

namee

...

I'd like to require following dataset.

resultdataset:

[name_value]

name1

name2

namea

nameb

name3

namee

...

I can't get it by this code.

data resultdataset;

   merge dataset1 dataset2;

   by notsorted name_value;

run;

Please tell me the way to get that result.

Thank you,

Hide

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

My first question would be why?  Logically speaking there is no validity in a a random order like that.  You would be better off working out a good and reproducible output rather than something like this, if the first dataset is changed, then you may find the output dataset is no longer like a previous version.  If you still want to go ahead, then assign a variable to _n_ in both instances:

data inter1;

     set dataset1;

     nvar=_n_;

run;

data inter2;

     set dataset2;

     nvar=_n_;

run;

data want;

     merge inter1 inter2;

     by name_value;

run;

proc sort data=want;

     by nvar name_value;

run;


But then again, why.  To use the data in any meaningful way later on you will have to sort it anyways.

Loko
Barite | Level 11

Hello,

data have1;
input name_value $5.;
datalines;
name1
name2
name3
;


data have2;
input name_value $5.;
datalines;
name1
namea
nameb
name3
namee
;

proc sort data=have1 presorted;
by name_value;
run;

proc sort data=have2 presorted;
by name_value;
run;


data want;
update have2 have1;
by name_value;
run;

Hoz
Calcite | Level 5 Hoz
Calcite | Level 5

Dear all,

Thank you for reply and sorry to take you to ambiguous question.

Specifically speaking,I want to know the difference of valiables of two datasets,
these datasets have almost same valiables.Additionally, valiable's parts are grouping semantically,

so result dataset needs to keep orders.

First I got these detasets included valiable name of each dataset by proc contents. And then,want to compare
these datasets...

ex)
dataset1:
[name]
id
fname
lname
maintest1
maintest2
subtest1
subtest2
date1

dataset2:
[name]
id
id_sub
fname
lname
maintest1
maintest2
maintest2_2
subtest1
date1
date1_sub


want dataset:
[inDS1 inDS2 valname]
1  1  id
0  1  id_sub
1  1  fname
1  1  lname
1  1  maintest1
1  1  maintest2
0  1  maintest2_2
1  1  subtest1
1  0  subtest2
1  1  date1
0  1  date1_sub

Reeza
Super User

I don't see how order matters.

And you can use SASHELP.VCOLUMNS to simplify your query.

Here's a macro I wrote to solve the problem a while back but  you can easily remove the macro portion if you don't need it:

SAS_compare_dataset_variables · GitHub

%macro compare_data(base=, compare=);

proc sql noprint;

  create table base_in as

  select name

  from sashelp.vcolumn

  where libname=upper(scan("&base", 1, "."))

  and memname=upper(scan("&base", 2, "."))

  order by varnum;

quit;

proc sql noprint;

  create table compare_in as

  select name

  from sashelp.vcolumn

  where libname=upper(scan("&compare", 1, "."))

  and memname=upper(scan("&compare", 2, "."))

  order by varnum;

quit;

proc sql;

  create table comparison as

  select a.name as base_var, b.name as compare_var,

  case when missing(a.name) then catx("-", "Comparison dataset has extra variable", b.name)

  when missing(b.name) then catx("-", "Comparison dataset is missing variable", a.name)

  when a.name=b.name then "Variable in both datasets"

  else "CHECKME"

  end as comparison

  from base_in as a

  full join compare_in as b

  on a.name=b.name;

quit;

/* data _null_; */

/* set comparison; */

/* if comparison not eq "Variable in both datasets" */

/*   then put "ERROR:" comparison; */

/* run; */

%mend;

/*example of calling macro*/

data class;

set sashelp.class;

bmi=(weight/(height**2))*703;

run;

data class2;

set sashelp.class;

weight_kgs=weight*0.453592;

height_cm=height*2.54;

run;

%compare_data(base=work.class2, compare=work.class);

ChrisNZ
Tourmaline | Level 20

If you really need this, start the data step by reading the dataset structures in the wanted order.

data WANT;

  if 0 then set DS2(keep= id  id_sub fname  lname  maintest1  maintest2  maintest2_2  subtest1);

  if 0 then set DS1(keep= subtest2);

  if 0 then set DS2(keep= date1  date1_sub);

  merge DS1 DS2;

...

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Oh, then that is very simple:

proc sql;

     create table WANT as

     select     A.VARNUM as ORDER_IN_BASE,

                   B.VARNUM as ORDER_IN_COMPARE,

                   COALESCE(A.NAME,B.NAME) as NAME

     from       (select * from SASHELP.VCOLUMN where LIBNAME="WORK" and MEMNAME="DATASET1") A

     full join   (select * from SASHELP.VCOLUMN where LIBNAME="WORK" and MEMNAME="DATASET2") B

     on          A.NAME=B.NAME

     order by  COALESCE(A.NAME,B.NAME) /* note I have order by name here, you could do any combination of name and orders */;

quit;

Hoz
Calcite | Level 5 Hoz
Calcite | Level 5

Thank you for answer.

But i posted this because it's difficult to do with keeping varnum order.

If it exists the same valiable name in both dataset, output  this valiable name  by varnum order.

If it exists the valiable name in one side dataset, output the valiable name between neighborhood same valiable name with keeping varnum order.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, but that is because the order you are thinking of is not a *logical* order.  Say you have:

1     abc

2     def

3      efg

and

1     efg

2     abc

What order should efg appear at?  What about abc, it is order 2 in the second example, but 1 in the first, the two merge but don't have the same order.  What I would suggest is that you think of the logical order for display, something you can put in a proc sort.  Alternatively have a look at the specifications, that should be your base, i.e. if the specs state: 1=abc, 2=def... then you can code that in and then merge the two sets of variables to that as you then have a predefined list.  If you haven't got specs, well...

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1761 views
  • 3 likes
  • 6 in conversation