DO Loop

AP718 · Posted 09-09-2020 07:23 AM

I want to create a do loop to update Table 1 with the Code value from Table 2 (future years of data). I want the loop to check by Id and Year. At the first instance of the Code value being found then update Table 1 and End. Else continue to loop through all years until found and end after all years have been looped. The years to search or loop will be 2013-2015.

Table 3 is what I want as the final result. I want the year variable to be updated so I know which year the Code value was found. Thank you!

TABLE 1
YEAR	ID	CODE
2012	1
2012	2

TABLE 2
YEAR	ID	CODE
2013	1
2013	2	B
2014	1
2014	2
2015	1	A
2015	2

TABLE 3
YEAR	ID	CODE	IMPUTATION YEAR
2012	1	A	2015
2012	2	B	2013

novinosrin · Posted 09-09-2020 08:00 AM



data table1;
input year id code $;
cards;
2012	1  .
2012	2  .
;

data table2;
input year id code $;
cards;
2013	1	.
2013	2	B
2014	1	.
2014	2	.
2015	1	A
2015	2	.
;

data want;
 set table1;
 if _n_=1 then do;
  if 0 then set table2(rename=(year=imputation_year));
  dcl hash H (dataset:'table2(where=(not missing(code)) rename=(year=imputation_year) )') ;
  h.definekey  ("id") ;
  h.definedata('code','imputation_year');
  h.definedone () ;
 end;
 if h.find()=0;
run;

AP718 · Posted 09-09-2020 11:00 AM

Thank you.

How can this work if there is more than one variable in the table with missing data and to be updated?:

TABLE 1
YEAR	ID	CODE	CODE1	CODE2
2012	1
2012	2

TABLE 2
YEAR	ID	CODE	CODE1	CODE2
2013	1
2013	2	B	B1	B2
2014	1
2014	2
2015	1	A	A1	A2
2015	2

TABLE 3
YEAR	ID	CODE	CODE1	CODE2	IMPUTATION YEAR
2012	1	A	A1	A2	2015
2012	2	B	B1	B2	2013

novinosrin · Posted 09-09-2020 12:05 PM




data table1;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2012	1			
2012	2			
;



data table2;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2013	1			
2013	2	B	B1	B2
2014	1			
2014	2			
2015	1	A	A1	A2
2015	2			
;

data want;
 if 0 then set table1;
 if _n_=1 then do;
  dcl hash h( );
  h.definekey  ("id") ;
  h.definedata('code','code1','code2','imputation_year');
  h.definedone () ;
  do until(z);
   set table2(rename=(year=imputation_year)) end=z;
   if cmiss(code,code1,code2)=0 then h.ref();
  end;
 end;
 set table1;
 if h.find()=0;
run;

AP718 · Posted 09-09-2020 03:31 PM

I tried with this code by only changing the table1, table2 and want names but the fields I needed to update are now missing or deleted from my want table. Is there code I need to fill in for the hash statements?

novinosrin · Posted 09-09-2020 03:46 PM

Hi @AP718 Please post the best representative mock sample of your real. That would help you and me to progress quickly towards the final solution. If it is a good representative of your real, you can just copy/paste and be over it. Also, post the expected result for the samples.

AP718 · Posted 09-10-2020 07:48 AM

Hi, what I've shared is the best mock sample. There is missing values in the Table 1 - Code, Code1 and Code2 fields that I want to update with the values from the Table 2 - Code, Code1 and Code2 fields. I want to update or link the tables by the ID field. I want to create a loop or iterative process that will search Table 2 by the ID. At the first instance of the Code, Code1, Code2 values being found in Table 2 then update the Code, Code1 , Code2 fields in Table 1 and End. If not found then leave blank. My expected end result is what I shared in Table 3. The values for Code have no significance. Thanks again!

Tom · Posted 09-10-2020 08:03 AM

Can you describe what you are trying to do in words. Don't try to code it yet, don't talk about loops, just explain the rule in words.

Some of what you are saying makes it sound like you want to find the first value of CODE and YEAR from TABLE2 per ID where the value of CODE is not missing.

If the data is sorted by ID that is easy. If it is not sorted the use PROC SORT to sort it.

data first_code ;
  set table2;
   by id year ;
   where not missing(code);
   if first.id;
   keep id code year;
   rename code=first_code year=imputation_year ;
run;

Now just merge it with your existing TABLE1.

data want ;
  merge table1 first_code ;
  by id;
run;

AP718 · Posted 09-10-2020 08:33 AM

I see. I may have over complicated it by stating do loop. This is essentially what I am looking for. Is there a way to do this using HASH? As it's a large table I am working with. Thanks

novinosrin · Posted 09-10-2020 09:33 AM

Hi @AP718 I am not sure why the code works on the sample and why it doesn't on your real. I tested the same below making a slight change i.e i

if h.find() ne 0 then call missing(code,code1,code2,imputation_year);

instead of

 if h.find()=0;

Can you please review thoroughly, make the appropriate changes carefully and execute the below?-



 
data table1;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2012	1			
2012	2			
;



data table2;
infile cards truncover;
input year id (code code1 code2) ($);
cards;
2013	1			
2013	2	B	B1	B2
2014	1			
2014	2			
2015	1	A	A1	A2
2015	2			
;

data want;
 if 0 then set table1;
 if _n_=1 then do;
  dcl hash h( );
  h.definekey  ("id") ;
  h.definedata('code','code1','code2','imputation_year');
  h.definedone () ;
  do until(z);
   set table2(rename=(year=imputation_year)) end=z;
   if cmiss(code,code1,code2)=0 then h.ref();
  end;
 end;
 set table1;
 if h.find() ne 0 then call missing(code,code1,code2,imputation_year);
run;

Tom · Posted 09-10-2020 09:37 AM

HASH objects reside in memory, so you might not be able to store a really large table in memory.

What are the actual sizes of your tables? It might be that one could fit into a hash and the other not.

Is your TABLE1 only a subset of the IDs in your TABLE2?

Are you not processing the data in the already defined key order? I would assume that your large table is already sorted or index by ID and YEAR so that the data step I posted is simple to do.

If the data is already sorted/indexed you can probably do what I posted in a single step.

So for example if you want to keep just the values in TABLE1 and add those two new columns from TABLE2 you might just use SET and RETAIN to do that.

data want ;
   set table2 (in=in2 keep=id code year rename=(code=_code year=_year)
      where=(_code is not missing))
         table1 (in=in1)
  ;
  by id;
  if first.id then do;  
    if in2 then do;
       first_code=_code; first_year=_year ;
    end;
    else call missing(first_code,first_year);
    retain first_code first_year ;
  end;
  if in1;
run;

AP718 · Posted 09-10-2020 11:07 AM

Thank you both. I was able to impute the data by using the below HASH code:
It seems the difference was to set the if h.find() ge 0 instead of if h.find()=0 and I had to declare the source dataset.


data want(rename=(year1=imputation_year));
 set table1;
 if _n_=1 then do;
  dcl hash h(dataset:"table2");
  h.definekey  ("id") ;
  h.definedata("year1","code","code1","code2");
  h.definedone ();
 end;
if h.find()ge 0;
run;

DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Re: DO Loop

Classroom Training Available!