BookmarkSubscribeRSS Feed
Sheeba
Lapis Lazuli | Level 10

Hi All,

 

I have a sas dataset which has got missing values for two variables (CD_B,CD_A). I would like to update the values from another sas dataset if the values are missing for CD_B,CD_A.

 

 

Since the master dataset (one here) will be huge in actual scenario . Is there any way I can avoid sorting and update the missing values?

 

data one;

input file_type $ cardtl $ acctdtl $ CD_B CD_A ;

datalines;

S IS123 F 123 111

A IS023 H 224 . 

B IS223 P .   333

;

 run;

 

data three;

input cardtl $ acctdtl $ CD_B CD_A;

datalines;

IS023 H 224 222

IS223 P 524 333

;

run;

 

Thanks in advance,

Regards,

Sheeba

12 REPLIES 12
Sheeba
Lapis Lazuli | Level 10

Hi ,

 

Thanks a lot for the reply.

 

The datasets are  huge and the data in the update dataset is bound to change. so I added the default values in a spreadsheet , imported into a dataset. joining the two datasets seems to take up a lot of space and is indefinitely running.

 

Regards,

Sheeba

Sheeba
Lapis Lazuli | Level 10

Hi,

 

master dataset has over 1 million records and update has less around 200 .

 

Regards,

Sheeba

Astounding
PROC Star

Depending on the number of updates, you could use the second data set holding the updates to write out a set of IF / THEN statements to a file.  Just use FILE and PUT statements.  Based on your example data, the file should contain (you can correct this if it's wrong):

 

if cardtl = 'IS123' and acctdtl='F' then do;

   if CD_B=. then CD_B = 224;

   if CD_A=. then CD_A = 222;

end;

else if cardtl = 'IS223' and acctdtl='P' then do;

   if CD_B=. then CD_B = 524;

   if CD_A=. then CD_A = 333;

end;

 

From that point it's easy:

 

data one_revised;

set one;

%include "path to file holding IF THEN statements";

run;

Sheeba
Lapis Lazuli | Level 10

Hi Astounding,

 

Thanks a lot for the response,

 

Sure I will try this out.

 

Regards,

Sheeba Swaminathan

s_lassen
Meteorite | Level 14

While a format can work for random access, as suggested by @Kurt_Bremser, using SET with KEY= is probably faster if you have a lot of data:

proc sql;

  create index idx on three(cardtl,acctdtl);

run;

data want;

  set one;

  if missing(CD_B) then do;

    if missing(CD_A) then

      set three key=idx/unique;

    else

      set three(drop=CD_A) key=idx/unique;

    end;

  else if missing(CD_A) then

    set three(drop=CD_B) key=idx/unique;

run;

I made the code a bit more complicated than absolutely necessary (by checking if both variables were missing, and then just using one SET statement), as that will probably give better performance with large tables.

Sheeba
Lapis Lazuli | Level 10

Hi s_lassen,

 

Thanks a lot for the suggestion,

 

I will try this out with modified if conditions

 

Regards,

Sheeba Swaminathan

ruchi11dec
Obsidian | Level 7

you can do this by update :

 

data one;

update one three;

by cardt1;

run;

ballardw
Super User

@ruchi11dec wrote:

you can do this by update :

 

data one;

update one three;

by cardt1;

run;


Update done this way would require 1) both data sets to be sorted (which the OP explicitly stated he didn't want to do) and 2) that the value of the by variable cardt1 uniquely identify each record, which we don't know.

 

The NOTSORTED option cannot be used with Update.

The MODIFY option might work but still relies on having BY variables to uniquely identify the records to modify.

data work.class;
   set sashelp.class;
run;

data work.update;
   name='Henry';
   height=99;
run;

data work.class;
   modify work.class
          work.update
   ;
   by notsorted name;
run;
Sheeba
Lapis Lazuli | Level 10

Hi ballardw,

 

Thanks a lot for the suggestion.

 

there was no unique by variable key to identify the records. So I created a unique key and then used modified statement and it worked for me.

 

Thank you.

 

Regards,

Sheeba

Sheeba
Lapis Lazuli | Level 10

Hi ruchi11dec ,

 

Thanks a lot for the response,

 

I tried update statement but I don't have a unique key in both the datasets..

 

Regards,

Sheeba Swaminathan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4625 views
  • 4 likes
  • 6 in conversation