Hello,
I have a dataset that looks like this:
SEX | COUNTRY | PERIOD | X1 | X2 |
MALE | A | 2000 | 2 | 3 |
2 | 2 | |||
2 | 2 | |||
FEMALE | 1 | 5 | ||
4 | 1 | |||
2 | 3 | |||
MALE | B | 1 | 3 | |
4 | 3 | |||
3 | 4 | |||
FEMALE | 5 | 3 | ||
4 | 1 | |||
4 | 1 | |||
MALE | C | 2 | 2 | |
… | … | … | ||
Z | 2050 | 6 | 4 |
I want to fill the missing value of SEX COUNTRY and PERIOD with the previous value, until it changes. So it should look like this:
SEX | COUNTRY | PERIOD | X1 | X2 |
MALE | A | 2000 | 2 | 3 |
MALE | A | 2000 | 2 | 2 |
MALE | A | 2000 | 2 | 2 |
FEMALE | A | 2000 | 1 | 5 |
FEMALE | A | 2000 | 4 | 1 |
FEMALE | A | 2000 | 2 | 3 |
MALE | B | 2000 | 1 | 3 |
MALE | B | 2000 | 4 | 3 |
MALE | B | 2000 | 3 | 4 |
FEMALE | B | 2000 | 5 | 3 |
FEMALE | B | 2000 | 4 | 1 |
FEMALE | B | 2000 | 4 | 1 |
MALE | C | 2000 | 2 | 2 |
… | … | … | ||
Z | 2050 |
I'm not really sure how to do this.
NVM, I found the solution
data want;
set have;
retain _period;
if not missing(period) then _period=period;
else period=_period;
drop _period;
retain _sex;
if not missing(sex) then _sex=sex;
else sex=_sex;
drop _sex;
retain _country;
if not missing(country) then _country=country;
else country=_country;
drop _country;
run;
NVM, I found the solution
data want;
set have;
retain _period;
if not missing(period) then _period=period;
else period=_period;
drop _period;
retain _sex;
if not missing(sex) then _sex=sex;
else sex=_sex;
drop _sex;
retain _country;
if not missing(country) then _country=country;
else country=_country;
drop _country;
run;
You see that you write a piece of code repeatedly. This is a case for using a macro to reduce your work:
%macro process(variable);
retain help_&variable;
if not missing(&variable)
then _&variable = &variable;
else &variable = _&variable;
drop _&variable;
%mend;
data want;
set have;
%process(sex)
%process(country)
%process(period)
run;
Or use an array and loop over the array rather than bit by bit.
Hello Kurt,
I succeeded to replace missing values by the previous values.
but only on the column "year_2017" .
my dataset has many columns ... year_2013, year_2014,year_2015,.......year_2018.
how could I apply this code below for each column year_xxxx
retain _an_2017 ;
if not missing(an_2017) then _an_2017 = an_2017 ; else an_2017 = _an_2017 ;
drop _an_2017 ;
thanks a lot in advance
regards
Nasser
@Nasser_DRMCP wrote:
Hello Kurt,
I succeeded to replace missing values by the previous values.
but only on the column "year_2017" .
my dataset has many columns ... year_2013, year_2014,year_2015,.......year_2018.
how could I apply this code below for each column year_xxxx
retain _an_2017 ;
if not missing(an_2017) then _an_2017 = an_2017 ; else an_2017 = _an_2017 ;
drop _an_2017 ;
thanks a lot in advance
regards
Nasser
Erroneous posting in the wrong topic?
Hello,
What changes would I make to the DATA step below to get the output appended below?
Thanks,
data test; infile datalines DSD missover; input name:$13. School: $30.; datalines; ,,Pinecrest Elementary School Craig Smith, Paula Johnson, Chung Wang, ; proc print; run; data test1; set test; retain _school; if not missing(school) then _school=school; else school = _school; drop _school; run; proc print data=test1; run;
(Desired output, not generated by the above code blocks)
Obs name School
1 Craig Smith Pinecrest Elementary School
2 Paula Johnson Pinecrest Elementary School
3 Chung Wang Pinecrest Elementary School
If you want to fix the first step to generate that output without the need for the second step then this is classic example of reading in a report where some "header" information is not repeated. You just need to read the header information once and remember it. Much like the second step is doing with data coming from a dataset instead of a text file.
data test;
infile datalines dsd truncover ;
length name $13 _school school $30 ;
retain school;
input name _school;
if not missing(_school) then do;
school=_school;
delete;
end;
drop _school ;
datalines;
,Pinecrest Elementary School
Craig Smith,
Paula Johnson,
Chung Wang,
;
PS I fixed the first line of data so that it has only TWO values and not the THREE it had it what you posted by removing the extra comma.
Thanks, Tom. Your code has worked. However, I am trying to understand the usage of the DELETE statement within the IF-THEN-DO-END block? Is this statement telling SAS to delete the observations with missing values?
In you example data you did not want to write out that observation. The DELETE statement will end the iteration of the data step right there. Since there is no explicit OUTPUT statement there is an implicit OUTPUT done when the step reaches the end of an iteration, so with the DELETE statement that implicit OUTPUT never happens.
If your data was different such that you did want to write out that observation then just don't include the DELETE statement.
A common example would be a report were the first column is used to group the data so it is only present on the first observation in a group. You wouldn't want to delete that first observation. But if you had a report where the group name is on its own line, like your example , then you don't want to include that line.
Hi Tom. Thanks for the clarification.
A more important question would be why the "data" is like that in the first place. It looks a bit like your reading in a produced report - not a recommended approach for multiple reasons (populations, calculations, assumptions etc.). Get the real "data" and use that.
I copied/pasted in Excel the output of a proc tabulate and then imported it in SAS. When I tried the out statement, it gaves me only one column for variables X1 X2. I suppose this was probably not the most efficient way.
No, its really not. Use out= from tabulate, then manipulate the output dataset to get what you want. Alternative, use one of the other procedures (means/summary/SQL aggregates) or one of the other hundreds of ways to get output you want. For example, your output doesn't make any sense to me:
SEX | COUNTRY | PERIOD | X1 | X2 |
MALE | A | 2000 | 2 | 3 |
2 | 2 | |||
2 | 2 |
Why are there three rows, there is no identifer on each row to say what the rows reflect other then the first one?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.