Sorry for the poor subject line.
I'm reading in a data set that is very odd and has a lot of missing data. I'll try to illustrate below:
data have;
input ID val comment $;
datalines;
1 6 start
. 12 adjust
. 15 adjust
. 6 final
2 12 start
2 15 final
3 6 start
. 12 adjust
3 15 final
;
data want;
input ID val comment $;
datalines;
1 6 start
1 6 final
2 12 start
2 15 final
3 6 start
3 15 final
;
In order to get my "want" data set, I would normally sort by ID and use first/last processing to extract the records needed. However, in this case I don't always have a value in the ID field for the records that I want to capture. I'm not exactly sure if a retain or a lag statement would work in this case.
Any ideas on an approach?
Is the idea to replace the missing ID values with the earlier non-missing values?
The same method can be used but without using BY statement or FIRST. and LAST. variables.
data want ;
set have;
if missing(id) then id=newid;
else newid=id;
retain newid ;
drop newid;
run;
Is the idea to replace the missing ID values with the earlier non-missing values?
The same method can be used but without using BY statement or FIRST. and LAST. variables.
data want ;
set have;
if missing(id) then id=newid;
else newid=id;
retain newid ;
drop newid;
run;
If you can rely on the variable COMMENT (i.e. if in your sample input, it's not just for show) and the "start" row always has a non-missing ID value (as it appears from your sample input), then one variant could be:
data have ;
input ID val comment $ ;
cards ;
1 6 start
. 12 adjust
. 15 adjust
. 6 final
2 12 start
2 15 final
3 6 start
. 12 adjust
3 15 final
run ;
data want (drop = _:) ;
retain ID ;
set have (rename = (id = _id)) ;
if comment = "start" then ID = _id ;
if comment in ("start", "final") ;
run ;
Kind regards
Paul D.
You're welcome, of course, but your "also" gets me a bit puzzled because @Tom's program and mine generate different outputs. Namely, mine outputs only the "start" and "final" rows (as in your sample output), while @Tom's keeps all the rows, including "adjust". Neither is right or wrong, it just makes me wonder what it is that you really want.
Kind regards
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.