BookmarkSubscribeRSS Feed
mlcross
Calcite | Level 5

Hi there,

 

I currently have the following code which works a dream when replicating the first row of a dataset down to all non missing rows within the same dataset - in my scenario only the first row should ever be populated.

 

The problem is the input dataset is expanding from 3 columns to 47, each with the same basic format and issue of missing records.

I've been exploring using various combinations of arrays/do loops to automate the solution but so far have gotten nowhere fast. Any help would be really appreciated.

 

I've included my original code below

 

data &_Output.;
	set readin_excel;
	retain _Main_Page__Region _Main_Page__STP _Main_Page__Postcode;
		if not missing(Main_Page__Region) 
			then _Main_Page__Region=Main_Page__Region;
			else Main_Page__Region=_Main_Page__Region;
   	if not missing(Main_Page__STP) 
			then _Main_Page__STP=Main_Page__STP;
			else Main_Page__STP=_Main_Page__STP;
	   if not missing(_Main_Page__Postcode) 
			then _Main_Page__Postcode=_Main_Page__Postcode;
			else _Main_Page__Postcode=_Main_Page__Postcode;
   drop _Main_Page__Region _Main_Page__STP _Main_Page__Postcode;
run;
3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, good old Excel, nothing like a really poor data medium to create work.  Anyways, its likely arrays are what you seek.  The below code just shows an example with the code you provide (post test data in the form of a datastep as text in a code window - the {i} above the post - to get more accurate results):

data want (drop=conv:);
  set readin_excel;
  array orig{3} main_page_region main_page_stp main_page_postcode;
  array conv{3};
  retain conv:;
  do i=1 to 3;
    if not missing(orig{i}) then conv{i}=orig{i};
    else orig{i}=conv{i};
  end;
run;
mlcross
Calcite | Level 5

Thanks for the swift reply. Unfortunately the code provided did not fix the problem but it might have had as much to do with a poor example on my side.

 

I've included an updated scenario below featuring just a single column but the aim of my opening topic remains, having one solution that means I don't have to individually code ~50 retain statements by hand.

 

/* Set up test data */

data &_Output.;
infile datalines dlm=',';
input ID :8. Additional_Information :$16.;
datalines;
1,Some random junk
2,
3,
4,
5,
;
run;

/* Manual "working" solution */

data &_Output2.;
	set &_Output.;
	retain _Additional_Information;
		if missing(Additional_Information) 
		then Additional_Information=_Additional_Information;
      else _Additional_Information=Additional_Information;
run;

/* Interpretation of original proposed solution. Doesn't retain values correctly */

data &_Output3. (drop=conv:);
  set &_Output.;
  array orig{1} Additional_Information;
  array conv{1};
  retain conv:;
  do i=1 to 1;
    if not missing(orig{i}) then conv{i}=orig{i};
    else orig{i}=conv{i};
  end;
run;

/* Further interpretation, same results :( */

data &_Output4. (drop=conv:);
  set &_Output.;
  array orig{1} Additional_Information;
  array conv{1};
  retain conv:;
  do i=1 to 1;
    if orig{i}^='' then conv{i}=orig{i};
    else orig{i}=conv{i};
  end;
run;


 

Thanks again for your help so far.

Tom
Super User Tom
Super User

If you want to do last observation carried forward, then UPDATE can make this simple and eliminate the need for new variables or RETAIN statement.

data want ;
  update have(obs=0) have ;
  by cust_id;
  output;
run;

If there are some variables that you did NOT want to apply this LOCF logic to then one way to do that is to add another SET statement to pull back in the observation. You can use either a DROP= or KEEP= dataset option depending whether it is easier to list the variable you want to retain or those that you don't.

data want ;
  update have(obs=0) have ;
  by cust_id;
  set have(drop= variables_to_retain );
  output;
run;

But if you really want to replace the missing values with the value from the FIRST observation then that will not work and your posted code will not work either since both are replacing the missing value with the last non-missing value instead of the first value (or the first non-missing value).

You might be able to get the replace with first option to work if you used something like this.

data want ;
  set have ;
  retain P 1 ;
  if missing(X) then set have(keep=X) point=p ;
  if missing(Y) then set have(keep=Y) point=p ;
run;

But that will not work if you want to do it BY some ID variable.  

You might do that by adding some BY processing.

data want ;
  set have ;
  by id;
  if first.id then p=_n_;
  retain P ;
  if missing(X) then set have(keep=X) point=p ;
  if missing(Y) then set have(keep=Y) point=p ;
run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 5483 views
  • 2 likes
  • 3 in conversation