BookmarkSubscribeRSS Feed
littlestone
Fluorite | Level 6
suppose I have two data sets which are "Main" and "Support":

Main Dataset:

ID Variable_1
1..........0
2..........0
3..........0
4..........0
5..........0
6..........0
7..........0
8..........0
9..........0

Support Dataset:

ID Variable_1
2..........False
4..........True
5..........False
6..........True
9..........True

I want to update the Main data set in a way that the Variable_1 will be re-valued based on information from Support, For example, for ID 4, because the corresponding Variable_1 in the Support data set is "True", the Variable_1 in the Main data set will be re-valued to 1; similarly, for ID 2, because the corresponding Variable_1 in the Support data set is "False", the Variable_1 in the Main data set will be kept 0.

So the new updated Main should be:

ID Variable_1
1..........0
2..........0
3..........0
4..........1
5..........0
6..........1
7..........0
8..........0
9..........1

What could be an efficient Data step or Proc SQL to accomplish such task?

Thanks.
6 REPLIES 6
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Littlestone,

This is a possible solution:
[pre]
data m;
input ID Variable_1;
datalines;
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
run;
data s;
input ID Variable_1 $;
datalines;
2 False
4 True
5 False
6 True
9 True
run;
proc SQL;
create table r as
select a.ID,
case
when UPCASE(b.Variable_1) = "TRUE" then 1
when UPCASE(b.Variable_1) = "FALSE" or UPCASE(b.Variable_1) = "" then 0
else -1
end as Variable_1
from m as a left join s as b
on a.ID=b.ID
;quit;
[/pre]
Sincerely,
SPR
littlestone
Fluorite | Level 6
Thank you very much. It is just what I need!
littlestone
Fluorite | Level 6
Hello, SPR

Sorry to bother you again. I noticed that in your code:


.......proc SQL;
create table r as
select a.ID,
case........



You used "select a.ID", what if there are many variables in dataset a? Can I use "select a.* ?
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Littlestone,

In general you can use a.* but in your request it was necessary to update variable_1.
If you say a.* it means that variable_1 (without any changes) is also included, so it will be necessary to create some variable_2 according to requirements in your initial post.

Sincerely,
SPR
littlestone
Fluorite | Level 6
thank you all for great help
Ksharp
Super User
[pre]




data m;
input ID Variable_1;
datalines;
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
;
run;
data s;
input ID Variable_1 $;
datalines;
2 False
4 True
5 False
6 True
9 True
;
run;
[/pre]


proc format;



 invalue        varr



        
'False'=0



        
'True'=1;



run;



 



proc sql;



 create table result as



  select coalesce(s.id,m.id) as id ,coalesce(input(s.variable_1,varr.)
,m.variable_1)
as var



   from s right join m
on s.id = m.id



   ;



quit;



 



 



 



 



 



Ksharp



Message was edited by: Ksharp
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1925 views
  • 0 likes
  • 3 in conversation