DATA Step, Macro, Functions and more

Fill missing values with the previous values

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 113
Accepted Solution

Fill missing values with the previous values

Hello,

I have a dataset that looks like this:

 

SEX COUNTRY PERIOD X1 X2
MALE A 2000 2 3
      2 2
      2 2
FEMALE     1 5
      4 1
      2 3
MALE B   1 3
      4 3
      3 4
FEMALE     5 3
      4 1
      4 1
MALE C   2 2
   
  Z 2050 6 4

 

I want to fill the missing value of SEX COUNTRY and PERIOD with the previous value, until it changes. So it should look like this:

 

SEX COUNTRY PERIOD X1 X2
MALE A 2000 2 3
MALE A 2000 2 2
MALE A 2000 2 2
FEMALE A 2000 1 5
FEMALE A 2000 4 1
FEMALE A 2000 2 3
MALE B 2000 1 3
MALE B 2000 4 3
MALE B 2000 3 4
FEMALE B 2000 5 3
FEMALE B 2000 4 1
FEMALE B 2000 4 1
MALE C 2000 2 2
   
  Z 2050    

 

I'm not really sure how to do this.

 


Accepted Solutions
Solution
‎01-20-2017 07:35 AM
Frequent Contributor
Posts: 113

Re: Fill missing values with the previous values

Posted in reply to Demographer

NVM, I found the solution

 

data want;
set have;
retain _period;
if not missing(period) then _period=period;
else period=_period;
drop _period;

retain _sex;
if not missing(sex) then _sex=sex;
else sex=_sex;
drop _sex;

retain _country;
if not missing(country) then _country=country;
else country=_country;
drop _country;
run;

View solution in original post


All Replies
Solution
‎01-20-2017 07:35 AM
Frequent Contributor
Posts: 113

Re: Fill missing values with the previous values

Posted in reply to Demographer

NVM, I found the solution

 

data want;
set have;
retain _period;
if not missing(period) then _period=period;
else period=_period;
drop _period;

retain _sex;
if not missing(sex) then _sex=sex;
else sex=_sex;
drop _sex;

retain _country;
if not missing(country) then _country=country;
else country=_country;
drop _country;
run;

Super User
Posts: 7,782

Re: Fill missing values with the previous values

Posted in reply to Demographer

You see that you write a piece of code repeatedly. This is a case for using a macro to reduce your work:

%macro process(variable);
retain help_&variable;
if not missing(&variable)
then _&variable = &variable;
else &variable = _&variable;
drop _&variable;
%mend;

data want;
set have;
%process(sex)
%process(country)
%process(period)
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,955

Re: Fill missing values with the previous values

Posted in reply to KurtBremser

Or use an array and loop over the array rather than bit by bit.

New Contributor
Posts: 3

Re: Fill missing values with the previous values

Posted in reply to Demographer
Love this!
Super User
Super User
Posts: 7,955

Re: Fill missing values with the previous values

Posted in reply to Demographer

A more important question would be why the "data" is like that in the first place.  It looks a bit like your reading in a produced report - not a recommended approach for multiple reasons (populations, calculations, assumptions etc.).  Get the real "data" and use that.

Frequent Contributor
Posts: 113

Re: Fill missing values with the previous values

[ Edited ]

I copied/pasted in Excel the output of a proc tabulate and then imported it in SAS. When I tried the out statement, it gaves me only one column for variables X1 X2. I suppose this was probably not the most efficient way.

Super User
Super User
Posts: 7,955

Re: Fill missing values with the previous values

Posted in reply to Demographer

No, its really not.  Use out= from tabulate, then manipulate the output dataset to get what you want.  Alternative, use one of the other procedures (means/summary/SQL aggregates) or one of the other hundreds of ways to get output you want.  For example, your output doesn't make any sense to me:

SEX COUNTRY PERIOD X1 X2
MALE A 2000 2 3
      2 2
      2 2

 Why are there three rows, there is no identifer on each row to say what the rows reflect other then the first one?

Frequent Contributor
Posts: 113

Re: Fill missing values with the previous values

It is the output of a proc tabulate (cels below were merged).
Super User
Super User
Posts: 7,955

Re: Fill missing values with the previous values

Posted in reply to Demographer

Yes, but it doesn't make much sense.  

You have a male, from country A in period 2000, who has reulsts 2,3 (no information as to what the 2/3 represents).  You then have two more rows 2/2 2/2.  Which is it 2/3, 2/2 or 2/2?  Or was it three timepoints within period 2000?  Post some test data if you want some code on how to do it directly.

Frequent Contributor
Posts: 113

Re: Fill missing values with the previous values

Don't pay attention to the data. I cannot copy the real values for confidentiality purposes. These are just random values. There is also another row that I forget to include which was the age group.
Respected Advisor
Posts: 3,799

Re: Fill missing values with the previous values

Posted in reply to Demographer

Demographer wrote:

I copied/pasted in Excel the output of a proc tabulate and then imported it in SAS. When I tried the out statement, it gaves me only one column for variables X1 X2. I suppose this was probably not the most efficient way.


PROC TABULATE can create a data set directly using OUT= on the PROC statement.   This would avoid all the headaches associates with a trip to EXhell and back.

 

You can more easily fix your data imported from EXhell using the update trick

data haveV/view=haveV;
   set have;
   retain dummy 1;
   run;
data want;
   update haveV(obs=0 keep=dummy) haveV;
   by dummy;
   output;
   drop dummy;
   run;
Frequent Contributor
Posts: 113

Re: Fill missing values with the previous values

Posted in reply to data_null__
I tried this. The problem is that I want 2 separate column for X1 and X2 (these are 2 categories of a same variable). When I used the out= statement, they appear in a single column.
Super User
Super User
Posts: 7,955

Re: Fill missing values with the previous values

Posted in reply to Demographer

Ok then, at a guess:

proc means data=have;
  by sex country period age_group;
  var thevar;
  where thevar="A";
  output out=tab1 n=x1;
run;
proc means data=have;
  by sex country period age_group;
  var thevar;
  where thevar="B";
  output out=tab2 n=x2;
run;
data want;
  merge tab1 tab2;
  by sex country period age_group;
run;

 You could also proc means to get your overall output then transpose the data up, or count it directly in a datastep etc.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 502 views
  • 4 likes
  • 5 in conversation