BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Demographer
Pyrite | Level 9

Hello,

I have a dataset that looks like this:

 

SEX COUNTRY PERIOD X1 X2
MALE A 2000 2 3
      2 2
      2 2
FEMALE     1 5
      4 1
      2 3
MALE B   1 3
      4 3
      3 4
FEMALE     5 3
      4 1
      4 1
MALE C   2 2
   
  Z 2050 6 4

 

I want to fill the missing value of SEX COUNTRY and PERIOD with the previous value, until it changes. So it should look like this:

 

SEX COUNTRY PERIOD X1 X2
MALE A 2000 2 3
MALE A 2000 2 2
MALE A 2000 2 2
FEMALE A 2000 1 5
FEMALE A 2000 4 1
FEMALE A 2000 2 3
MALE B 2000 1 3
MALE B 2000 4 3
MALE B 2000 3 4
FEMALE B 2000 5 3
FEMALE B 2000 4 1
FEMALE B 2000 4 1
MALE C 2000 2 2
   
  Z 2050    

 

I'm not really sure how to do this.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Demographer
Pyrite | Level 9

NVM, I found the solution

 

data want;
set have;
retain _period;
if not missing(period) then _period=period;
else period=_period;
drop _period;

retain _sex;
if not missing(sex) then _sex=sex;
else sex=_sex;
drop _sex;

retain _country;
if not missing(country) then _country=country;
else country=_country;
drop _country;
run;

View solution in original post

22 REPLIES 22
Demographer
Pyrite | Level 9

NVM, I found the solution

 

data want;
set have;
retain _period;
if not missing(period) then _period=period;
else period=_period;
drop _period;

retain _sex;
if not missing(sex) then _sex=sex;
else sex=_sex;
drop _sex;

retain _country;
if not missing(country) then _country=country;
else country=_country;
drop _country;
run;

Kurt_Bremser
Super User

You see that you write a piece of code repeatedly. This is a case for using a macro to reduce your work:

%macro process(variable);
retain help_&variable;
if not missing(&variable)
then _&variable = &variable;
else &variable = _&variable;
drop _&variable;
%mend;

data want;
set have;
%process(sex)
%process(country)
%process(period)
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Or use an array and loop over the array rather than bit by bit.

Nasser_DRMCP
Lapis Lazuli | Level 10

Hello Kurt,

 

I succeeded to replace missing values by the previous values.

but only on the column "year_2017" .

my dataset has many columns ... year_2013, year_2014,year_2015,.......year_2018.

how could I apply this code below for each column year_xxxx

retain _an_2017 ;

if not missing(an_2017) then _an_2017 = an_2017 ; else an_2017 = _an_2017 ;

drop _an_2017 ;

 

thanks a lot in advance

regards

Nasser

Kurt_Bremser
Super User

@Nasser_DRMCP wrote:

Hello Kurt,

 

I succeeded to replace missing values by the previous values.

but only on the column "year_2017" .

my dataset has many columns ... year_2013, year_2014,year_2015,.......year_2018.

how could I apply this code below for each column year_xxxx

retain _an_2017 ;

if not missing(an_2017) then _an_2017 = an_2017 ; else an_2017 = _an_2017 ;

drop _an_2017 ;

 

thanks a lot in advance

regards

Nasser


Erroneous posting in the wrong topic?

pkm_edu
Quartz | Level 8

Hello,

What changes would I make to the DATA step below to get  the output  appended below? 

Thanks,

 

data test;
infile datalines DSD  missover;
input name:$13. School: $30.;
datalines;
,,Pinecrest Elementary School
Craig Smith, 
Paula Johnson, 
Chung Wang,
;
proc print; run;

data test1;
set test;
retain _school;
if not missing(school) then _school=school;
else school = _school;
drop _school;
run;
proc print data=test1;
run;

(Desired output, not generated by the above code blocks)

Obs name School

1 Craig Smith Pinecrest Elementary School
2 Paula Johnson Pinecrest Elementary School
3 Chung Wang Pinecrest Elementary School

Tom
Super User Tom
Super User

If you want to fix the first step to generate that output without the need for the second step then this is classic example of reading in a report where some "header" information is not repeated.  You just need to read the header information once and remember it. Much like the second step is doing with data coming from a dataset instead of a text file.

data test;
  infile datalines dsd truncover ;
  length name $13 _school school $30 ;
  retain school;
  input name _school;
  if not missing(_school) then do;
     school=_school;
     delete;
  end;
  drop _school ;
datalines;
,Pinecrest Elementary School
Craig Smith, 
Paula Johnson, 
Chung Wang,
;

PS I fixed the first line of data so that it has only TWO values and not the THREE it had it what you posted by removing the extra comma.

pkm_edu
Quartz | Level 8

Thanks, Tom. Your code has worked.  However, I am trying to understand the usage of the DELETE statement within the IF-THEN-DO-END block? Is this statement telling SAS to delete the observations with missing values? 

Tom
Super User Tom
Super User

In you example data you did not want to write out that observation.  The DELETE statement will end the iteration of the data step right there.  Since there is no explicit OUTPUT statement there is an implicit OUTPUT done when the step reaches the end of an iteration, so with the DELETE statement that implicit OUTPUT never happens.

 

If your data was different such that you did want to write out that observation then just don't include the DELETE statement.

A common example would be a report were the first column is used to group the data so it is only present on the first observation in a group.  You wouldn't want to delete that first observation.  But if you had a report where the group name is on its own line, like your example , then you don't want to include that line.

pkm_edu
Quartz | Level 8

Hi Tom. Thanks for the clarification. 

MelissaM
Obsidian | Level 7
Love this!
RW9
Diamond | Level 26 RW9
Diamond | Level 26

A more important question would be why the "data" is like that in the first place.  It looks a bit like your reading in a produced report - not a recommended approach for multiple reasons (populations, calculations, assumptions etc.).  Get the real "data" and use that.

Demographer
Pyrite | Level 9

I copied/pasted in Excel the output of a proc tabulate and then imported it in SAS. When I tried the out statement, it gaves me only one column for variables X1 X2. I suppose this was probably not the most efficient way.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

No, its really not.  Use out= from tabulate, then manipulate the output dataset to get what you want.  Alternative, use one of the other procedures (means/summary/SQL aggregates) or one of the other hundreds of ways to get output you want.  For example, your output doesn't make any sense to me:

SEX COUNTRY PERIOD X1 X2
MALE A 2000 2 3
      2 2
      2 2

 Why are there three rows, there is no identifer on each row to say what the rows reflect other then the first one?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 22 replies
  • 86740 views
  • 17 likes
  • 9 in conversation