BookmarkSubscribeRSS Feed
littlestone
Fluorite | Level 6
suppose I have two data sets which are "Main" and "Support":

Main Dataset:

ID Variable_1
1..........0
2..........0
3..........0
4..........0
5..........0
6..........0
7..........0
8..........0
9..........0

Support Dataset:

ID Variable_1
2..........False
4..........True
5..........False
6..........True
9..........True

I want to update the Main data set in a way that the Variable_1 will be re-valued based on information from Support, For example, for ID 4, because the corresponding Variable_1 in the Support data set is "True", the Variable_1 in the Main data set will be re-valued to 1; similarly, for ID 2, because the corresponding Variable_1 in the Support data set is "False", the Variable_1 in the Main data set will be kept 0.

So the new updated Main should be:

ID Variable_1
1..........0
2..........0
3..........0
4..........1
5..........0
6..........1
7..........0
8..........0
9..........1

What could be an efficient Data step or Proc SQL to accomplish such task?

Thanks.
6 REPLIES 6
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Littlestone,

This is a possible solution:
[pre]
data m;
input ID Variable_1;
datalines;
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
run;
data s;
input ID Variable_1 $;
datalines;
2 False
4 True
5 False
6 True
9 True
run;
proc SQL;
create table r as
select a.ID,
case
when UPCASE(b.Variable_1) = "TRUE" then 1
when UPCASE(b.Variable_1) = "FALSE" or UPCASE(b.Variable_1) = "" then 0
else -1
end as Variable_1
from m as a left join s as b
on a.ID=b.ID
;quit;
[/pre]
Sincerely,
SPR
littlestone
Fluorite | Level 6
Thank you very much. It is just what I need!
littlestone
Fluorite | Level 6
Hello, SPR

Sorry to bother you again. I noticed that in your code:


.......proc SQL;
create table r as
select a.ID,
case........



You used "select a.ID", what if there are many variables in dataset a? Can I use "select a.* ?
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Littlestone,

In general you can use a.* but in your request it was necessary to update variable_1.
If you say a.* it means that variable_1 (without any changes) is also included, so it will be necessary to create some variable_2 according to requirements in your initial post.

Sincerely,
SPR
littlestone
Fluorite | Level 6
thank you all for great help
Ksharp
Super User
[pre]




data m;
input ID Variable_1;
datalines;
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
;
run;
data s;
input ID Variable_1 $;
datalines;
2 False
4 True
5 False
6 True
9 True
;
run;
[/pre]


proc format;



 invalue        varr



        
'False'=0



        
'True'=1;



run;



 



proc sql;



 create table result as



  select coalesce(s.id,m.id) as id ,coalesce(input(s.variable_1,varr.)
,m.variable_1)
as var



   from s right join m
on s.id = m.id



   ;



quit;



 



 



 



 



 



Ksharp



Message was edited by: Ksharp

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1634 views
  • 0 likes
  • 3 in conversation