Re: efficient way to uodate dataset

littlestone · Posted 03-22-2011 04:18 PM

suppose I have two data sets which are "Main" and "Support":

Main Dataset:

ID Variable_1
1..........0
2..........0
3..........0
4..........0
5..........0
6..........0
7..........0
8..........0
9..........0

Support Dataset:

ID Variable_1
2..........False
4..........True
5..........False
6..........True
9..........True

I want to update the Main data set in a way that the Variable_1 will be re-valued based on information from Support, For example, for ID 4, because the corresponding Variable_1 in the Support data set is "True", the Variable_1 in the Main data set will be re-valued to 1; similarly, for ID 2, because the corresponding Variable_1 in the Support data set is "False", the Variable_1 in the Main data set will be kept 0.

So the new updated Main should be:

ID Variable_1
1..........0
2..........0
3..........0
4..........1
5..........0
6..........1
7..........0
8..........0
9..........1

What could be an efficient Data step or Proc SQL to accomplish such task?

Thanks.

SPR · Posted 03-22-2011 04:58 PM

Hello Littlestone,

This is a possible solution:
[pre]
data m;
input ID Variable_1;
datalines;
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
run;
data s;
input ID Variable_1 $;
datalines;
2 False
4 True
5 False
6 True
9 True
run;
proc SQL;
create table r as
select a.ID,
case
when UPCASE(b.Variable_1) = "TRUE" then 1
when UPCASE(b.Variable_1) = "FALSE" or UPCASE(b.Variable_1) = "" then 0
else -1
end as Variable_1
from m as a left join s as b
on a.ID=b.ID
;quit;
[/pre]
Sincerely,
SPR

littlestone · Posted 03-22-2011 09:54 PM

Thank you very much. It is just what I need!

littlestone · Posted 03-22-2011 11:18 PM

Hello, SPR

Sorry to bother you again. I noticed that in your code:

.......proc SQL;
create table r as
select a.ID,
case........

You used "select a.ID", what if there are many variables in dataset a? Can I use "select a.* ?

SPR · Posted 03-23-2011 09:52 AM

Hello Littlestone,

In general you can use a.* but in your request it was necessary to update variable_1.
If you say a.* it means that variable_1 (without any changes) is also included, so it will be necessary to create some variable_2 according to requirements in your initial post.

Sincerely,
SPR

littlestone · Posted 03-23-2011 01:16 PM

thank you all for great help

Ksharp · Posted 03-23-2011 12:53 AM

[pre]

data m;
input ID Variable_1;
datalines;
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
;
run;
data s;
input ID Variable_1 $;
datalines;
2 False
4 True
5 False
6 True
9 True
;
run;
[/pre]

proc format;

invalue varr

'False'=0

'True'=1;

run;

proc sql;

create table result as

select coalesce(s.id,m.id) as id ,coalesce(input(s.variable_1,varr.)
,m.variable_1) as var

from s right join m
on s.id = m.id

;

quit;

Ksharp

Message was edited by: Ksharp

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!