Hello,
It should be pretty simple but I can't find a way to do this quickly.
I have table 1 that looks like this :
a | b | c | |
plumber | 21a1 | 21a1 | 21a1 |
fireman | 47b1 | 47b2 | 47b3 |
policeman | 47b1 | 47b2 | 47b3 |
and table 2 that looks like this :
occupation | level |
plumber | a |
plumber | a |
fireman | c |
policeman | b |
policeman | a |
And I would like to add a column to table 2, like this :
occupation | level | code |
plumber | a | 21a1 |
plumber | a | 21a1 |
fireman | c | 47b3 |
policeman | b | 47b2 |
policeman | a | 47b1 |
I was thinking of a complicated solution with a merge but I'm sure there is some more efficient solution. Also, in my table 1, I have way more columns than just a, b, c. So i'm looking for some function that can read columns names and put it in relation with the modalities of the "level" variable in table 2.
First step: transpose table1 to a long format:
proc sort data=table1;
by occupation;
run;
proc transpose
data=table1
out=table1_trans (
rename=(col1=code _name_=level)
where=(level ne 'occupation')
)
;
by occupation notsorted;
var _all_;
run;
From this, you can either
A hash solution will look like this:
data table1;
input occupation :$15. a $ b $ c $;
cards;
plumber 21a1 21a1 21a1
fireman 47b1 47b2 47b3
policeman 47b1 47b2 47b3
;
run;
data table2;
input occupation :$15. level $;
cards;
plumber a
plumber a
fireman c
policeman b
policeman a
;
run;
proc sort data=table1;
by occupation;
run;
proc transpose
data=table1
out=table1_trans (
rename=(col1=code _name_=level)
where=(level ne 'occupation')
)
;
by occupation;
var _all_;
run;
data want;
set table2;
length code $4;
if _n_ = 1
then do;
declare hash codes (dataset:'table1_trans');
rc = codes.definekey('occupation','level');
rc = codes.definedata('code');
rc = codes.definedone();
end;
rc = codes.find();
drop rc;
run;
Note the datasteps for the example datasets. They make it much easier for the helpers to recreate data.
Do something like this
data table1;
length occupation $20;
input occupation$ a$ b$ c$;
datalines;
plumber 21a1 21a1 21a1
fireman 47b1 47b2 47b3
policeman 47b1 47b2 47b3
;
data table2;
length occupation $20;
input occupation $ level $;
datalines;
plumber a
plumber a
fireman c
policeman b
policeman a
;
data table1_long(keep=occupation level code);
set table1;
array levels{*} a b c;
do i=1 to dim(levels);
level=vname(levels[i]);
code=levels[i];
output;
end;
run;
proc sql;
create table want as
select a.*
,b.code
from table2 as a, table1_long as b
where a.occupation=b.occupation and a.level=b.level;
quit;
Post test data as a datastep! I am not typing in test data. As such this is untested:
data want; merge a b; by occupation; array v{3} a b c; do i=1 to 3; if level=vname(v{i}) then code=v{i}; end; run;
Thats using an array. A simpler, more effective solution overall and for all your data storage needs is to store data in normalised form, i.e. more rows, less columns. So change table 1 to look like:
plumber a 21a1
plumber b 21a1
...
then it is simply merging by occupation level. You can simply transpose table 1 to get this format. E.g
data table1; input occupation $ a $ b $ c $; datalines; plumber 21a1 21a1 21a1 fireman 47b1 47b2 47b3 policeman 47b1 47b2 47b3 ; run; proc sort data=table1; by occupation; run; proc transpose data=table1 out=table1; by occupation; var a b c; run;
First step: transpose table1 to a long format:
proc sort data=table1;
by occupation;
run;
proc transpose
data=table1
out=table1_trans (
rename=(col1=code _name_=level)
where=(level ne 'occupation')
)
;
by occupation notsorted;
var _all_;
run;
From this, you can either
A hash solution will look like this:
data table1;
input occupation :$15. a $ b $ c $;
cards;
plumber 21a1 21a1 21a1
fireman 47b1 47b2 47b3
policeman 47b1 47b2 47b3
;
run;
data table2;
input occupation :$15. level $;
cards;
plumber a
plumber a
fireman c
policeman b
policeman a
;
run;
proc sort data=table1;
by occupation;
run;
proc transpose
data=table1
out=table1_trans (
rename=(col1=code _name_=level)
where=(level ne 'occupation')
)
;
by occupation;
var _all_;
run;
data want;
set table2;
length code $4;
if _n_ = 1
then do;
declare hash codes (dataset:'table1_trans');
rc = codes.definekey('occupation','level');
rc = codes.definedata('code');
rc = codes.definedone();
end;
rc = codes.find();
drop rc;
run;
Note the datasteps for the example datasets. They make it much easier for the helpers to recreate data.
Thank you for the explanations, I think I get the idea. Your program great because I don't have to type all the names of the columns in table 1.
proc transpose is one of the most useful tools to beat data into easily usable form. It allows you to have code that needs no explicit variable names where those are undetermined.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.