I am trying to join the following two data sets:
data testA;
input categorical $3. value;
*order = _n_;
datalines;
Dog.
M 7
F 5
Cat.
M 4
F 2
;
run;
data testA;
set testA;
order=_n_;
run;
data testB;
input categorical $2. value;
datalines;
Dog.
F 3
Cat.
M 1
F 2
;
run;
proc sql;
create table final as
select a.*,b.* from testA a left join testB b on a.categorical=b.categorical
order by order;
quit;
My desire output is the following:
data testA;
input categorical $ value value2;
datalines;
Dog . .
M 7 .
F 5 3
Cat . .
M 4 1
F 2 2
;
run;
The problems I've encountered is that 1) the 'categorical' id is not sorted alphabetically and I do not want to change its order 2) Since there are two Ms and Fs I don't know how to join without renaming the M F so that it's unique 3) It could be an inner join since what may be in value may not be in value2
/* Modify the way you read your data: */
data testA;
retain cat1;
input cat2 $3. value;
if missing(value) then cat1 = cat2;
else output;
datalines;
Dog.
M 7
F 5
Cat.
M 4
F 2
;
data testB;
retain cat1;
input cat2 $3. value;
if missing(value) then cat1 = cat2;
else output;
datalines;
Dog.
F 3
Cat.
M 1
F 2
;
/* Then you can use: */
proc sql;
create table final as
select
a.*,
b.value as value2
from
testA a left join
testB b on a.cat1=b.cat1 and a.cat2=b.cat2
order by cat1 desc, cat2 desc;
quit;
Must it be using proc sql? Coz any proc sql join would be product of keys in both tables and that means will do many to many based on key value and not on a position. The ideal is a merge.
data testA;
input categorical $3. value;
order = _n_;
datalines;
Dog .
M 7
F 5
Cat .
M 4
F 2
;
run;
data testB;
input categorical $3. value;
datalines;
Dog .
F 3
Cat .
M 1
F 2
;
run;
proc sort data=testa out=_testa;
by categorical;
run;
proc sort data=testb out=_testb;
by categorical;
run;
data temp;
merge _testa _testb(rename=(value=value2));
by categorical;
run;
proc sort data=temp out=want(drop=order);
by order;
run;
Also look at modify statement, your datasets appear to be a good case for applying modify statement as master dataset does not have to sorted.
If my understanding is right, modify statement with key categorical variable indexed look up would be a sound approach.
Thanks so much for your reply! After running, everything looks except there is a 1 under value2 Dog (M). Can I ask how I can remove that in the case of a blank?
Try this
data testA;
input categorical: $3. value: 8.;
*order = _n_;
datalines;
Dog .
M 7
F 5
Cat .
M 4
F 2
;
run;
data testA;
set testA;
order=_n_;
run;
proc sort data=testa;
by categorical order;
run;
data testa;
set testa;
by categorical;
if first.categorical then
id=0;
id+1;
run;
data testB;
input categorical: $3. value: 8.;
datalines;
Dog .
F 3
Cat .
M 1
F 2
;
run;
data testb;
set testb;
order=_n_;
run;
proc sort data=testb;
by categorical order;
run;
data testb;
set testb;
by categorical;
if first.categorical then id=0;
id+1;
run;
proc sql;
create table final as
select
coalescec(a.categorical,b.categorical) as categorical,
a.value as valuea,
b.value as valueb,
a.order
from
testA a left join testB b on a.categorical=b.categorical
and a.id=b.id
order by a.order;
quit;
Double-check your desired output. I think there are errors in it.
Try restructuring your original datasets by adding a species(dog/cat) and gender (m/f) columns . It will make your life easier .
/* Modify the way you read your data: */
data testA;
retain cat1;
input cat2 $3. value;
if missing(value) then cat1 = cat2;
else output;
datalines;
Dog.
M 7
F 5
Cat.
M 4
F 2
;
data testB;
retain cat1;
input cat2 $3. value;
if missing(value) then cat1 = cat2;
else output;
datalines;
Dog.
F 3
Cat.
M 1
F 2
;
/* Then you can use: */
proc sql;
create table final as
select
a.*,
b.value as value2
from
testA a left join
testB b on a.cat1=b.cat1 and a.cat2=b.cat2
order by cat1 desc, cat2 desc;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.