Sorry for the poor subject line.
I'm reading in a data set that is very odd and has a lot of missing data. I'll try to illustrate below:
data have;
input ID val comment $;
datalines;
1 6 start
. 12 adjust
. 15 adjust
. 6 final
2 12 start
2 15 final
3 6 start
. 12 adjust
3 15 final
;
data want;
input ID val comment $;
datalines;
1 6 start
1 6 final
2 12 start
2 15 final
3 6 start
3 15 final
;
In order to get my "want" data set, I would normally sort by ID and use first/last processing to extract the records needed. However, in this case I don't always have a value in the ID field for the records that I want to capture. I'm not exactly sure if a retain or a lag statement would work in this case.
Any ideas on an approach?
Is the idea to replace the missing ID values with the earlier non-missing values?
The same method can be used but without using BY statement or FIRST. and LAST. variables.
data want ;
set have;
if missing(id) then id=newid;
else newid=id;
retain newid ;
drop newid;
run;
Is the idea to replace the missing ID values with the earlier non-missing values?
The same method can be used but without using BY statement or FIRST. and LAST. variables.
data want ;
set have;
if missing(id) then id=newid;
else newid=id;
retain newid ;
drop newid;
run;
If you can rely on the variable COMMENT (i.e. if in your sample input, it's not just for show) and the "start" row always has a non-missing ID value (as it appears from your sample input), then one variant could be:
data have ;
input ID val comment $ ;
cards ;
1 6 start
. 12 adjust
. 15 adjust
. 6 final
2 12 start
2 15 final
3 6 start
. 12 adjust
3 15 final
run ;
data want (drop = _:) ;
retain ID ;
set have (rename = (id = _id)) ;
if comment = "start" then ID = _id ;
if comment in ("start", "final") ;
run ;
Kind regards
Paul D.
You're welcome, of course, but your "also" gets me a bit puzzled because @Tom's program and mine generate different outputs. Namely, mine outputs only the "start" and "final" rows (as in your sample output), while @Tom's keeps all the rows, including "adjust". Neither is right or wrong, it just makes me wonder what it is that you really want.
Kind regards
Paul D.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.