Hi All,
I have a sas dataset which has got missing values for two variables (CD_B,CD_A). I would like to update the values from another sas dataset if the values are missing for CD_B,CD_A.
Since the master dataset (one here) will be huge in actual scenario . Is there any way I can avoid sorting and update the missing values?
data one;
input file_type $ cardtl $ acctdtl $ CD_B CD_A ;
datalines;
S IS123 F 123 111
A IS023 H 224 .
B IS223 P . 333
;
run;
data three;
input cardtl $ acctdtl $ CD_B CD_A;
datalines;
IS023 H 224 222
IS223 P 524 333
;
run;
Thanks in advance,
Regards,
Sheeba
If dataset three is small enough, you could create a format and do it in one sequential step.
Hi ,
Thanks a lot for the reply.
The datasets are huge and the data in the update dataset is bound to change. so I added the default values in a spreadsheet , imported into a dataset. joining the two datasets seems to take up a lot of space and is indefinitely running.
Regards,
Sheeba
Define "huge" in terms of number of observations.
Hi,
master dataset has over 1 million records and update has less around 200 .
Regards,
Sheeba
Depending on the number of updates, you could use the second data set holding the updates to write out a set of IF / THEN statements to a file. Just use FILE and PUT statements. Based on your example data, the file should contain (you can correct this if it's wrong):
if cardtl = 'IS123' and acctdtl='F' then do;
if CD_B=. then CD_B = 224;
if CD_A=. then CD_A = 222;
end;
else if cardtl = 'IS223' and acctdtl='P' then do;
if CD_B=. then CD_B = 524;
if CD_A=. then CD_A = 333;
end;
From that point it's easy:
data one_revised;
set one;
%include "path to file holding IF THEN statements";
run;
Hi Astounding,
Thanks a lot for the response,
Sure I will try this out.
Regards,
Sheeba Swaminathan
While a format can work for random access, as suggested by @Kurt_Bremser, using SET with KEY= is probably faster if you have a lot of data:
proc sql;
create index idx on three(cardtl,acctdtl);
run;
data want;
set one;
if missing(CD_B) then do;
if missing(CD_A) then
set three key=idx/unique;
else
set three(drop=CD_A) key=idx/unique;
end;
else if missing(CD_A) then
set three(drop=CD_B) key=idx/unique;
run;
I made the code a bit more complicated than absolutely necessary (by checking if both variables were missing, and then just using one SET statement), as that will probably give better performance with large tables.
Hi s_lassen,
Thanks a lot for the suggestion,
I will try this out with modified if conditions
Regards,
Sheeba Swaminathan
you can do this by update :
data one;
update one three;
by cardt1;
run;
@ruchi11dec wrote:
you can do this by update :
data one;
update one three;
by cardt1;
run;
Update done this way would require 1) both data sets to be sorted (which the OP explicitly stated he didn't want to do) and 2) that the value of the by variable cardt1 uniquely identify each record, which we don't know.
The NOTSORTED option cannot be used with Update.
The MODIFY option might work but still relies on having BY variables to uniquely identify the records to modify.
data work.class; set sashelp.class; run; data work.update; name='Henry'; height=99; run; data work.class; modify work.class work.update ; by notsorted name; run;
Hi ballardw,
Thanks a lot for the suggestion.
there was no unique by variable key to identify the records. So I created a unique key and then used modified statement and it worked for me.
Thank you.
Regards,
Sheeba
Hi ruchi11dec ,
Thanks a lot for the response,
I tried update statement but I don't have a unique key in both the datasets..
Regards,
Sheeba Swaminathan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.