BookmarkSubscribeRSS Feed
AP718
Obsidian | Level 7

I want to create a do loop to update Table 1 with the Code value from Table 2 (future years of data). I want the loop to check by Id and Year. At the first instance of the Code value being found then update Table 1 and End. Else continue to loop through all years until found and end after all years have been looped. The years to search or loop will be 2013-2015.

 

Table 3 is what I want as the final result. I want the year variable to be updated so I know which year the Code value was found. Thank you!

 

 

TABLE 1

 

 

YEAR

ID 

CODE

2012

1

 

2012

2

 

 

TABLE 2

 

 

YEAR

ID 

CODE

2013

1

 

2013

2

B

2014

1

 

2014

2

 

2015

1

A

2015

2

 

 

TABLE 3

 

 

 

YEAR

ID 

CODE

IMPUTATION YEAR 

2012

1

A

2015

2012

2

B

2013

11 REPLIES 11
novinosrin
Tourmaline | Level 20


data table1;
input year id code $;
cards;
2012	1  .
2012	2  .
;

data table2;
input year id code $;
cards;
2013	1	.
2013	2	B
2014	1	.
2014	2	.
2015	1	A
2015	2	.
;

data want;
 set table1;
 if _n_=1 then do;
  if 0 then set table2(rename=(year=imputation_year));
  dcl hash H (dataset:'table2(where=(not missing(code)) rename=(year=imputation_year) )') ;
  h.definekey  ("id") ;
  h.definedata('code','imputation_year');
  h.definedone () ;
 end;
 if h.find()=0;
run;


AP718
Obsidian | Level 7

Thank you. 

 

How can this work if there is more than one variable in the table with missing data and to be updated?:

 

TABLE 1

 

 

 

 

YEAR

ID 

CODE

CODE1

CODE2

2012

1

 

 

 

2012

2

 

 

 

 

TABLE 2

 

 

 

 

YEAR

ID 

CODE

CODE1

CODE2

2013

1

 

 

 

2013

2

B

B1

B2

2014

1

 

 

 

2014

2

 

 

 
20151AA1

A2

2015

2

 

 

 

 

 

TABLE 3

 

 

 

 

 

YEAR

ID 

CODE

CODE1

CODE2

IMPUTATION YEAR 

2012

1

A

A1

A2

2015

2012

2

B

B1

B2

2013

novinosrin
Tourmaline | Level 20



data table1;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2012	1			
2012	2			
;



data table2;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2013	1			
2013	2	B	B1	B2
2014	1			
2014	2			
2015	1	A	A1	A2
2015	2			
;

data want;
 if 0 then set table1;
 if _n_=1 then do;
  dcl hash h( );
  h.definekey  ("id") ;
  h.definedata('code','code1','code2','imputation_year');
  h.definedone () ;
  do until(z);
   set table2(rename=(year=imputation_year)) end=z;
   if cmiss(code,code1,code2)=0 then h.ref();
  end;
 end;
 set table1;
 if h.find()=0;
run;
AP718
Obsidian | Level 7
I tried with this code by only changing the table1, table2 and want names but the fields I needed to update are now missing or deleted from my want table. Is there code I need to fill in for the hash statements?
novinosrin
Tourmaline | Level 20

Hi @AP718  Please post the best representative mock sample of your real. That would help you and me to progress quickly towards the final solution. If it is a good representative of your real, you can just copy/paste and be over it. Also, post the expected result for the samples.

AP718
Obsidian | Level 7
Hi, what I've shared is the best mock sample. There is missing values in the Table 1 - Code, Code1 and Code2 fields that I want to update with the values from the Table 2 - Code, Code1 and Code2 fields. I want to update or link the tables by the ID field. I want to create a loop or iterative process that will search Table 2 by the ID. At the first instance of the Code, Code1, Code2 values being found in Table 2 then update the Code, Code1 , Code2 fields in Table 1 and End. If not found then leave blank. My expected end result is what I shared in Table 3. The values for Code have no significance. Thanks again!
Tom
Super User Tom
Super User

Can you describe what you are trying to do in words.  Don't try to code it yet, don't talk about loops, just explain the rule in words. 

 

Some of what you are saying makes it sound like you want to find the first value of CODE and YEAR from TABLE2 per ID where the value of CODE is not missing.

 

If the data is sorted by ID that is easy.  If it is not sorted the use PROC SORT to sort it.

data first_code ;
  set table2;
   by id year ;
   where not missing(code);
   if first.id;
   keep id code year;
   rename code=first_code year=imputation_year ;
run;

Now just merge it with your existing TABLE1.

data want ;
  merge table1 first_code ;
  by id;
run;

 

AP718
Obsidian | Level 7
I see. I may have over complicated it by stating do loop. This is essentially what I am looking for. Is there a way to do this using HASH? As it's a large table I am working with. Thanks
novinosrin
Tourmaline | Level 20

Hi @AP718  I am not sure why the code works on the sample and why it doesn't on your real. I tested the same below making a slight change i.e i

 

if h.find() ne 0 then call missing(code,code1,code2,imputation_year);

instead of 

 if h.find()=0;

Can you please review thoroughly, make the appropriate changes carefully and execute the below?-



 
data table1;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2012	1			
2012	2			
;



data table2;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2013	1			
2013	2	B	B1	B2
2014	1			
2014	2			
2015	1	A	A1	A2
2015	2			
;

data want;
 if 0 then set table1;
 if _n_=1 then do;
  dcl hash h( );
  h.definekey  ("id") ;
  h.definedata('code','code1','code2','imputation_year');
  h.definedone () ;
  do until(z);
   set table2(rename=(year=imputation_year)) end=z;
   if cmiss(code,code1,code2)=0 then h.ref();
  end;
 end;
 set table1;
 if h.find() ne 0 then call missing(code,code1,code2,imputation_year);
run;

 

 

 

Tom
Super User Tom
Super User

HASH objects reside in memory, so you might not be able to store a really large table in memory.

What are the actual sizes of your tables?   It might be that one could fit into a hash and the other not.

Is your TABLE1 only a subset of the IDs in your TABLE2?

 

Are you not processing the data in the already defined key order?  I would assume that your large table is already sorted or index by ID and YEAR so that the data step I posted is simple to do.

 

If the data is already sorted/indexed you can probably do what I posted in a single step.

So for example if you want to keep just the values in TABLE1 and add those two new columns from TABLE2 you might just use SET and RETAIN to do that.

data want ;
   set table2 (in=in2 keep=id code year rename=(code=_code year=_year)
      where=(_code is not missing))
         table1 (in=in1)
  ;
  by id;
  if first.id then do;  
    if in2 then do;
       first_code=_code; first_year=_year ;
    end;
    else call missing(first_code,first_year);
    retain first_code first_year ;
  end;
  if in1;
run;
AP718
Obsidian | Level 7
Thank you both. I was able to impute the data by using the below HASH code:
It seems the difference was to set the if h.find() ge 0 instead of if h.find()=0 and I had to declare the source dataset.


data want(rename=(year1=imputation_year)); set table1; if _n_=1 then do; dcl hash h(dataset:"table2"); h.definekey ("id") ; h.definedata("year1","code","code1","code2"); h.definedone (); end; if h.find()ge 0; run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 906 views
  • 1 like
  • 3 in conversation