Hello,
Say that I have dataset1 with ID and variable1, like this:
| ID | Var1 |
| A | 1 |
| B | 2 |
| C | 3 |
| D | 4 |
| E | 5 |
I also have dataset2, which contains a reference variable, like this:
| RefVar |
| 1 |
| 2 |
| 3 |
How do I modify dataset1 such that any Var1's values not found in dataset2's Refvar will be converted to missing? The resulting dataset1 should look like this:
| ID | Var1 |
| A | 1 |
| B | 2 |
| C | 3 |
| D | . |
| E | . |
The actual dataset has a lot of values for "Var1" and "RefVar" so a PROC SQL solution would be ideal, I believe.
data one;
input ID $ Var1;
datalines;
A 1
B 2
C 3
D 4
E 5
;
data ref;
input RefVar;
datalines;
1
2
3
;
proc sql;
create table want as
select id ,var1+refvar-refvar as var1
from one left join ref on one.var1=ref.refvar
order by id,var1;
quit;
I would use dataset2 to create a format and then apply the format as desired.
You can use the hlo='O' to create the other record to assign it to missing if desired.
data one;
input ID $ Var1;
datalines;
A 1
B 2
C 3
D 4
E 5
;
data ref;
input RefVar;
datalines;
1
2
3
;
proc sql;
create table want as
select id ,var1+refvar-refvar as var1
from one left join ref on one.var1=ref.refvar
order by id,var1;
quit;
Thank you very much.
If Var1 and Refvar variables happen to be string would creating a format using the reference dataset, like what Reeza said, be the only option or is there a PROC SQL code for that? Would I have to create a new thread for this?
in that case, take the refvar as var 🙂
data one;
input ID $ Var1 $;
datalines;
A a
B b
C c
D d
E e
;
data ref;
input RefVar $;
datalines;
a
b
c
;
proc sql;
create table want as
select id ,refvar as var1
from one left join ref on one.var1=ref.refvar
order by id,2;
quit;This works for both string and numerics as you can tell from the code by replacing the var1 with ref from table2
Works as well. Thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.