DATA Step, Macro, Functions and more

Retain statement with do loops

Reply
Occasional Contributor
Posts: 12

Retain statement with do loops

Hi there,

 

I currently have the following code which works a dream when replicating the first row of a dataset down to all non missing rows within the same dataset - in my scenario only the first row should ever be populated.

 

The problem is the input dataset is expanding from 3 columns to 47, each with the same basic format and issue of missing records.

I've been exploring using various combinations of arrays/do loops to automate the solution but so far have gotten nowhere fast. Any help would be really appreciated.

 

I've included my original code below

 

data &_Output.;
	set readin_excel;
	retain _Main_Page__Region _Main_Page__STP _Main_Page__Postcode;
		if not missing(Main_Page__Region) 
			then _Main_Page__Region=Main_Page__Region;
			else Main_Page__Region=_Main_Page__Region;
   	if not missing(Main_Page__STP) 
			then _Main_Page__STP=Main_Page__STP;
			else Main_Page__STP=_Main_Page__STP;
	   if not missing(_Main_Page__Postcode) 
			then _Main_Page__Postcode=_Main_Page__Postcode;
			else _Main_Page__Postcode=_Main_Page__Postcode;
   drop _Main_Page__Region _Main_Page__STP _Main_Page__Postcode;
run;
Super User
Super User
Posts: 7,997

Re: Retain statement with do loops

Ah, good old Excel, nothing like a really poor data medium to create work.  Anyways, its likely arrays are what you seek.  The below code just shows an example with the code you provide (post test data in the form of a datastep as text in a code window - the {i} above the post - to get more accurate results):

data want (drop=conv:);
  set readin_excel;
  array orig{3} main_page_region main_page_stp main_page_postcode;
  array conv{3};
  retain conv:;
  do i=1 to 3;
    if not missing(orig{i}) then conv{i}=orig{i};
    else orig{i}=conv{i};
  end;
run;
Occasional Contributor
Posts: 12

Re: Retain statement with do loops

Thanks for the swift reply. Unfortunately the code provided did not fix the problem but it might have had as much to do with a poor example on my side.

 

I've included an updated scenario below featuring just a single column but the aim of my opening topic remains, having one solution that means I don't have to individually code ~50 retain statements by hand.

 

/* Set up test data */

data &_Output.;
infile datalines dlm=',';
input ID :8. Additional_Information :$16.;
datalines;
1,Some random junk
2,
3,
4,
5,
;
run;

/* Manual "working" solution */

data &_Output2.;
	set &_Output.;
	retain _Additional_Information;
		if missing(Additional_Information) 
		then Additional_Information=_Additional_Information;
      else _Additional_Information=Additional_Information;
run;

/* Interpretation of original proposed solution. Doesn't retain values correctly */

data &_Output3. (drop=conv:);
  set &_Output.;
  array orig{1} Additional_Information;
  array conv{1};
  retain conv:;
  do i=1 to 1;
    if not missing(orig{i}) then conv{i}=orig{i};
    else orig{i}=conv{i};
  end;
run;

/* Further interpretation, same results :( */

data &_Output4. (drop=conv:);
  set &_Output.;
  array orig{1} Additional_Information;
  array conv{1};
  retain conv:;
  do i=1 to 1;
    if orig{i}^='' then conv{i}=orig{i};
    else orig{i}=conv{i};
  end;
run;


 

Thanks again for your help so far.

Super User
Super User
Posts: 7,080

Re: Retain statement with do loops

If you want to do last observation carried forward, then UPDATE can make this simple and eliminate the need for new variables or RETAIN statement.

data want ;
  update have(obs=0) have ;
  by cust_id;
  output;
run;

If there are some variables that you did NOT want to apply this LOCF logic to then one way to do that is to add another SET statement to pull back in the observation. You can use either a DROP= or KEEP= dataset option depending whether it is easier to list the variable you want to retain or those that you don't.

data want ;
  update have(obs=0) have ;
  by cust_id;
  set have(drop= variables_to_retain );
  output;
run;

But if you really want to replace the missing values with the value from the FIRST observation then that will not work and your posted code will not work either since both are replacing the missing value with the last non-missing value instead of the first value (or the first non-missing value).

You might be able to get the replace with first option to work if you used something like this.

data want ;
  set have ;
  retain P 1 ;
  if missing(X) then set have(keep=X) point=p ;
  if missing(Y) then set have(keep=Y) point=p ;
run;

But that will not work if you want to do it BY some ID variable.  

You might do that by adding some BY processing.

data want ;
  set have ;
  by id;
  if first.id then p=_n_;
  retain P ;
  if missing(X) then set have(keep=X) point=p ;
  if missing(Y) then set have(keep=Y) point=p ;
run;

 

Ask a Question
Discussion stats
  • 3 replies
  • 105 views
  • 2 likes
  • 3 in conversation