DATA Step, Macro, Functions and more

How to update a sas dataset using another dataset if there are missing values in some columns ?

Reply
Regular Contributor
Posts: 181

How to update a sas dataset using another dataset if there are missing values in some columns ?

Hi All,

 

I have a sas dataset which has got missing values for two variables (CD_B,CD_A). I would like to update the values from another sas dataset if the values are missing for CD_B,CD_A.

 

 

Since the master dataset (one here) will be huge in actual scenario . Is there any way I can avoid sorting and update the missing values?

 

data one;

input file_type $ cardtl $ acctdtl $ CD_B CD_A ;

datalines;

S IS123 F 123 111

A IS023 H 224 . 

B IS223 P .   333

;

 run;

 

data three;

input cardtl $ acctdtl $ CD_B CD_A;

datalines;

IS023 H 224 222

IS223 P 524 333

;

run;

 

Thanks in advance,

Regards,

Sheeba

Super User
Posts: 9,867

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

If dataset three is small enough, you could create a format and do it in one sequential step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Regular Contributor
Posts: 181

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Posted in reply to KurtBremser

Hi ,

 

Thanks a lot for the reply.

 

The datasets are  huge and the data in the update dataset is bound to change. so I added the default values in a spreadsheet , imported into a dataset. joining the two datasets seems to take up a lot of space and is indefinitely running.

 

Regards,

Sheeba

Super User
Posts: 9,867

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Define "huge" in terms of number of observations.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Regular Contributor
Posts: 181

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Posted in reply to KurtBremser

Hi,

 

master dataset has over 1 million records and update has less around 200 .

 

Regards,

Sheeba

Super User
Posts: 6,622

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Depending on the number of updates, you could use the second data set holding the updates to write out a set of IF / THEN statements to a file.  Just use FILE and PUT statements.  Based on your example data, the file should contain (you can correct this if it's wrong):

 

if cardtl = 'IS123' and acctdtl='F' then do;

   if CD_B=. then CD_B = 224;

   if CD_A=. then CD_A = 222;

end;

else if cardtl = 'IS223' and acctdtl='P' then do;

   if CD_B=. then CD_B = 524;

   if CD_A=. then CD_A = 333;

end;

 

From that point it's easy:

 

data one_revised;

set one;

%include "path to file holding IF THEN statements";

run;

Regular Contributor
Posts: 181

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Posted in reply to Astounding

Hi Astounding,

 

Thanks a lot for the response,

 

Sure I will try this out.

 

Regards,

Sheeba Swaminathan

PROC Star
Posts: 253

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

While a format can work for random access, as suggested by @KurtBremser, using SET with KEY= is probably faster if you have a lot of data:

proc sql;

  create index idx on three(cardtl,acctdtl);

run;

data want;

  set one;

  if missing(CD_B) then do;

    if missing(CD_A) then

      set three key=idx/unique;

    else

      set three(drop=CD_A) key=idx/unique;

    end;

  else if missing(CD_A) then

    set three(drop=CD_B) key=idx/unique;

run;

I made the code a bit more complicated than absolutely necessary (by checking if both variables were missing, and then just using one SET statement), as that will probably give better performance with large tables.

Regular Contributor
Posts: 181

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Hi s_lassen,

 

Thanks a lot for the suggestion,

 

I will try this out with modified if conditions

 

Regards,

Sheeba Swaminathan

Occasional Contributor
Posts: 17

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

you can do this by update :

 

data one;

update one three;

by cardt1;

run;

Super User
Posts: 13,283

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Posted in reply to ruchi11dec

@ruchi11dec wrote:

you can do this by update :

 

data one;

update one three;

by cardt1;

run;


Update done this way would require 1) both data sets to be sorted (which the OP explicitly stated he didn't want to do) and 2) that the value of the by variable cardt1 uniquely identify each record, which we don't know.

 

The NOTSORTED option cannot be used with Update.

The MODIFY option might work but still relies on having BY variables to uniquely identify the records to modify.

data work.class;
   set sashelp.class;
run;

data work.update;
   name='Henry';
   height=99;
run;

data work.class;
   modify work.class
          work.update
   ;
   by notsorted name;
run;
Regular Contributor
Posts: 181

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Hi ballardw,

 

Thanks a lot for the suggestion.

 

there was no unique by variable key to identify the records. So I created a unique key and then used modified statement and it worked for me.

 

Thank you.

 

Regards,

Sheeba

Regular Contributor
Posts: 181

Re: How to update a sas dataset using another dataset if there are missing values in some columns ?

Posted in reply to ruchi11dec

Hi ruchi11dec ,

 

Thanks a lot for the response,

 

I tried update statement but I don't have a unique key in both the datasets..

 

Regards,

Sheeba Swaminathan

Ask a Question
Discussion stats
  • 12 replies
  • 133 views
  • 4 likes
  • 6 in conversation