Hi,
I have following two datasets.
dataset1:
[neme_value]
name1
name2
name3
...
dataset2:
[name_value]
name1
namea
nameb
name3
namee
...
I'd like to require following dataset.
resultdataset:
[name_value]
name1
name2
namea
nameb
name3
namee
...
I can't get it by this code.
data resultdataset;
merge dataset1 dataset2;
by notsorted name_value;
run;
Please tell me the way to get that result.
Thank you,
Hide
My first question would be why? Logically speaking there is no validity in a a random order like that. You would be better off working out a good and reproducible output rather than something like this, if the first dataset is changed, then you may find the output dataset is no longer like a previous version. If you still want to go ahead, then assign a variable to _n_ in both instances:
data inter1;
set dataset1;
nvar=_n_;
run;
data inter2;
set dataset2;
nvar=_n_;
run;
data want;
merge inter1 inter2;
by name_value;
run;
proc sort data=want;
by nvar name_value;
run;
But then again, why. To use the data in any meaningful way later on you will have to sort it anyways.
What is the logic for where to place name2 in the output dataset?
Hello,
data have1;
input name_value $5.;
datalines;
name1
name2
name3
;
data have2;
input name_value $5.;
datalines;
name1
namea
nameb
name3
namee
;
proc sort data=have1 presorted;
by name_value;
run;
proc sort data=have2 presorted;
by name_value;
run;
data want;
update have2 have1;
by name_value;
run;
Dear all,
Thank you for reply and sorry to take you to ambiguous question.
Specifically speaking,I want to know the difference of valiables of two datasets,
these datasets have almost same valiables.Additionally, valiable's parts are grouping semantically,
so result dataset needs to keep orders.
First I got these detasets included valiable name of each dataset by proc contents. And then,want to compare
these datasets...
ex)
dataset1:
[name]
id
fname
lname
maintest1
maintest2
subtest1
subtest2
date1
dataset2:
[name]
id
id_sub
fname
lname
maintest1
maintest2
maintest2_2
subtest1
date1
date1_sub
want dataset:
[inDS1 inDS2 valname]
1 1 id
0 1 id_sub
1 1 fname
1 1 lname
1 1 maintest1
1 1 maintest2
0 1 maintest2_2
1 1 subtest1
1 0 subtest2
1 1 date1
0 1 date1_sub
I don't see how order matters.
And you can use SASHELP.VCOLUMNS to simplify your query.
Here's a macro I wrote to solve the problem a while back but you can easily remove the macro portion if you don't need it:
SAS_compare_dataset_variables · GitHub
%macro compare_data(base=, compare=);
proc sql noprint;
create table base_in as
select name
from sashelp.vcolumn
where libname=upper(scan("&base", 1, "."))
and memname=upper(scan("&base", 2, "."))
order by varnum;
quit;
proc sql noprint;
create table compare_in as
select name
from sashelp.vcolumn
where libname=upper(scan("&compare", 1, "."))
and memname=upper(scan("&compare", 2, "."))
order by varnum;
quit;
proc sql;
create table comparison as
select a.name as base_var, b.name as compare_var,
case when missing(a.name) then catx("-", "Comparison dataset has extra variable", b.name)
when missing(b.name) then catx("-", "Comparison dataset is missing variable", a.name)
when a.name=b.name then "Variable in both datasets"
else "CHECKME"
end as comparison
from base_in as a
full join compare_in as b
on a.name=b.name;
quit;
/* data _null_; */
/* set comparison; */
/* if comparison not eq "Variable in both datasets" */
/* then put "ERROR:" comparison; */
/* run; */
%mend;
/*example of calling macro*/
data class;
set sashelp.class;
bmi=(weight/(height**2))*703;
run;
data class2;
set sashelp.class;
weight_kgs=weight*0.453592;
height_cm=height*2.54;
run;
%compare_data(base=work.class2, compare=work.class);
If you really need this, start the data step by reading the dataset structures in the wanted order.
data WANT;
if 0 then set DS2(keep= id id_sub fname lname maintest1 maintest2 maintest2_2 subtest1);
if 0 then set DS1(keep= subtest2);
if 0 then set DS2(keep= date1 date1_sub);
merge DS1 DS2;
...
run;
Oh, then that is very simple:
proc sql;
create table WANT as
select A.VARNUM as ORDER_IN_BASE,
B.VARNUM as ORDER_IN_COMPARE,
COALESCE(A.NAME,B.NAME) as NAME
from (select * from SASHELP.VCOLUMN where LIBNAME="WORK" and MEMNAME="DATASET1") A
full join (select * from SASHELP.VCOLUMN where LIBNAME="WORK" and MEMNAME="DATASET2") B
on A.NAME=B.NAME
order by COALESCE(A.NAME,B.NAME) /* note I have order by name here, you could do any combination of name and orders */;
quit;
Thank you for answer.
But i posted this because it's difficult to do with keeping varnum order.
If it exists the same valiable name in both dataset, output this valiable name by varnum order.
If it exists the valiable name in one side dataset, output the valiable name between neighborhood same valiable name with keeping varnum order.
Yes, but that is because the order you are thinking of is not a *logical* order. Say you have:
1 abc
2 def
3 efg
and
1 efg
2 abc
What order should efg appear at? What about abc, it is order 2 in the second example, but 1 in the first, the two merge but don't have the same order. What I would suggest is that you think of the logical order for display, something you can put in a proc sort. Alternatively have a look at the specifications, that should be your base, i.e. if the specs state: 1=abc, 2=def... then you can code that in and then merge the two sets of variables to that as you then have a predefined list. If you haven't got specs, well...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.