Hello community,
I am attempting to copy data into blank rows based on the ID date and hour. I am having trouble getting this to apply based on examples I have seen on this forum. My extra layer of difficulty comes from each ID could need two different locations copied to it. Some example data below...
Obs ID Date Hour Location Assignment Date
1 260 08May2023 0 07May2023
2 260 08May2023 1 A 07May2023
3 260 08May2023 2 A 07May2023
4 260 08May2023 3 07May2023
..........................................................................................................
15 260 08May2023 4 B 10May2023
16 260 08May2023 5 10May2023
17 300 08May2023 0 03May2023
18 300 08May2023 1 03May2023
19 300 08May2023 2 C 03May2023
20 300 08May2023 3 C 03May2023
21 300 08May2023 4 D 08May2023
22 300 08May2023 5 08May2023
The code I have used is this
proc sort data = have;
by ID Date Hour;
run;
data want;set have;retain pcol1;if missing(Location) then Location= pcol1;pcol1 = Location;drop pcol:;run;
The problem with this code is observation 17 & 18 will copy above making it B when it should be C. Same for observation 1 if it has data above it. Also you can see there are occurrences where one ID will have two locations in the dataset (Which is done by week). Most will be in one location the entire week but it does happen where an ID can be in one location then have a gap and then reappear at a new location. The code I have tried above works except when there is an observation that has location blank above it.
What code can be used to copy the cells with data for location into the cells missing location by ID and Assignment date in order of the date and hours. I say in order because the new assignment date is when you know the location has changed.
For clarity this is how the date should fill in:
Obs ID Date Hour Location Assignment Date
1 260 08May2023 0 A 07May2023
2 260 08May2023 1 A 07May2023
3 260 08May2023 2 A 07May2023
4 260 08May2023 3 A 07May2023
..........................................................................................................
15 260 08May2023 4 B 10May2023
16 260 08May2023 5 B 10May2023
17 300 08May2023 0 C 03May2023
18 300 08May2023 1 C 03May2023
19 300 08May2023 2 C 03May2023
20 300 08May2023 3 C 03May2023
21 300 08May2023 4 D 08May2023
22 300 08May2023 5 D 08May2023
Thank you in advance for anyone able to help me with this.
So LOCF (last observation carried forward) with the added wrinkle of also wanting the first non-empty observations to be pushed backwards.
Seems to me this same question was just asked earlier this week.
One way is to merge the data with a version of the data that only has non-missing location values so you can get an initial value for the carried forward value.
First let's make some actual test data (with expected results).
data have ;
input ID Date :date. Hour Location $ AssignmentDate :date. expect $;
format Date AssignmentDate date9. ;
cards ;
260 08May2023 0 . 07May2023 A
260 08May2023 1 A 07May2023 A
260 08May2023 2 A 07May2023 A
260 08May2023 3 . 07May2023 A
260 08May2023 4 B 10May2023 B
260 08May2023 5 . 10May2023 B
300 08May2023 0 . 03May2023 C
300 08May2023 1 . 03May2023 C
300 08May2023 2 C 03May2023 C
300 08May2023 3 C 03May2023 C
300 08May2023 4 D 08May2023 D
300 08May2023 5 . 08May2023 D
;
Now here is the step to fill in the missing LOCATION values.
data want;
merge have(in=in1)
have(in=in2
keep=id location rename=(location=first_loc)
where=(not missing(first_loc)))
;
by id ;
if first.id then previous_loc = first_loc;
Location=coalescec(Location,previous_loc);
output;
previous_loc=location;
retain previous_loc;
drop first_loc previous_loc;
run;
Result
Assignment OBS ID Date Hour Location Date expect 1 260 08MAY2023 0 A 07MAY2023 A 2 260 08MAY2023 1 A 07MAY2023 A 3 260 08MAY2023 2 A 07MAY2023 A 4 260 08MAY2023 3 A 07MAY2023 A 5 260 08MAY2023 4 B 10MAY2023 B 6 260 08MAY2023 5 B 10MAY2023 B 7 300 08MAY2023 0 C 03MAY2023 C 8 300 08MAY2023 1 C 03MAY2023 C 9 300 08MAY2023 2 C 03MAY2023 C 10 300 08MAY2023 3 C 03MAY2023 C 11 300 08MAY2023 4 D 08MAY2023 D 12 300 08MAY2023 5 D 08MAY2023 D
If you don't like the message about merge having duplicate observations for the same BY group then you can do it a double DOW loop instead.
data want;
do _n_=1 by 1 until(last.id);
set have ;
by id;
length previous_loc $8;
previous_loc=coalescec(previous_loc,location);
end;
do _n_=1 to _n_;
set have;
location=coalescec(location,previous_loc);
output;
previous_loc=location;
end;
drop previous_loc;
run;
May I ask why you seem to want a blank row in your data set? What is the benefit?
Hello, I am looking to fill in those blank rows with the appropriate location. I insert those blank observations as everyone in my dataset needs an observation for every hour throughout the week. The report I receive only gives an observation if their was an observation to report. Thus, if an observation was missed my code inserts the missing hour(s). When the row is inserted by ID it produces these blanks I then need to fill with the appropriate location.
Okay but that does not answer my question, what is the benefit? This is just a repeat of you saying that there needs to be blank rows inserted, said using more words.
Hi,
It's a bit hard to understand your data and logic, because the formatting is messed up. In order to preserve formatting for code or text output, you need to use the "insert code" button.
I think you have data like:
ID Date Hour Location AssignmentDate 260 08May2023 0 . 07May2023 260 08May2023 1 A 07May2023 260 08May2023 2 A 07May2023 260 08May2023 3 . 07May2023 260 08May2023 4 B 10May2023 260 08May2023 5 . 10May2023 300 08May2023 0 . 03May2023 300 08May2023 1 . 03May2023 300 08May2023 2 C 03May2023 300 08May2023 3 C 03May2023 300 08May2023 4 D 08May2023 300 08May2023 5 . 08May2023
And you want to fill in the missing values for location, to generate a WANT dataset like:
ID Date Hour Location AssignmentDate 260 08May2023 0 A 07May2023 260 08May2023 1 A 07May2023 260 08May2023 2 A 07May2023 260 08May2023 3 A 07May2023 260 08May2023 4 B 10May2023 260 08May2023 5 B 10May2023 300 08May2023 0 C 03May2023 300 08May2023 1 C 03May2023 300 08May2023 2 C 03May2023 300 08May2023 3 C 03May2023 300 08May2023 4 D 08May2023 300 08May2023 5 D 08May2023
Is that right?
What is the rule for filling in the missing values for location? Is there always only one location value per ID-AssignmentDate?
Hello Quentin,
Thank you for the reply. My formatting did get messed up as it the post was flagged for spam and I had to get customer service to repost.
I can give further background information. This data is a weekly report I receive to which I begin working with. If an observation occurred in a given hour then it shows in my report. If it does not then I have code that inserts the observation so when I do a means they contribute 0 to the set for the given day and hour.
The reason blanks exist is if there is no observation in the report I receive and I add it in for them to contribute data the Location and even Assignment date will be blank. I am trying to fill in the blank spaces that I inserted so they can be correctly categorized for the final count. The code I use works but if someone misses hour 0 on the first day of the report there is nothing for them to repeat down and it inherits the location from the previous ID.
The location can change for an ID as well in the middle of the week. If that occurs the code needs to account for this change. There would be a new Assignment date at the Date and Hour that the new assignment began and then I would like to start to carry down the new location based on the change, like you see for ID 300. Is there anyway I can get this resolved without manually having to go through?
Could you ever have two different locations for the same assignment date? Could you have data like:
ID Date Hour Location AssignmentDate 999 08May2023 0 . 07May2023 999 08May2023 1 A 07May2023 999 08May2023 2 . 07May2023 999 08May2023 3 B 07May2023
?
Or is it impossible to have two different Location values for the same ID-assignment?
Assuming an ID can only have one location per assignment date, and there is always at least one location value present in the data you have you could use BY-group processing similar to your current code, but sorting by ID AssignmentDate and Location (descending), so that the non-missing location will be the first record in each group.
data have ;
input ID : 8. Date :date9. Hour :1. Location :$1. AssignmentDate :date9. ;
format Date AssignmentDate date9. ;
cards ;
260 08May2023 0 . 07May2023
260 08May2023 1 A 07May2023
260 08May2023 2 A 07May2023
260 08May2023 3 . 07May2023
260 08May2023 4 B 10May2023
260 08May2023 5 . 10May2023
300 08May2023 0 . 03May2023
300 08May2023 1 . 03May2023
300 08May2023 2 C 03May2023
300 08May2023 3 C 03May2023
300 08May2023 4 D 08May2023
300 08May2023 5 . 08May2023
;
run ;
proc sort data=have ;
by ID AssignmentDate descending Location ;
run ;
data want ;
set have ;
by ID AssignmentDate ;
retain _loc ;
if first.AssignmentDate then _loc=location ;
if missing(location) then location=_loc ;
drop _: ;
run ;
proc sort data=want ;
by ID Date Hour;
run ;
Thank you for working on this. Unfortunately it is possible for one ID to have two different locations in the report. If not I would be golden. The assignment Date variable indicates the new date of when that occurred and will have the new location attached to it. The code will need to copy the location to all dates below it and above if needed. Then it would need to stop adding that location once the new assignment date began. When the data is sorted by Date and Hour the new start date would be at the hour and date when the new location has begun.
So LOCF (last observation carried forward) with the added wrinkle of also wanting the first non-empty observations to be pushed backwards.
Seems to me this same question was just asked earlier this week.
One way is to merge the data with a version of the data that only has non-missing location values so you can get an initial value for the carried forward value.
First let's make some actual test data (with expected results).
data have ;
input ID Date :date. Hour Location $ AssignmentDate :date. expect $;
format Date AssignmentDate date9. ;
cards ;
260 08May2023 0 . 07May2023 A
260 08May2023 1 A 07May2023 A
260 08May2023 2 A 07May2023 A
260 08May2023 3 . 07May2023 A
260 08May2023 4 B 10May2023 B
260 08May2023 5 . 10May2023 B
300 08May2023 0 . 03May2023 C
300 08May2023 1 . 03May2023 C
300 08May2023 2 C 03May2023 C
300 08May2023 3 C 03May2023 C
300 08May2023 4 D 08May2023 D
300 08May2023 5 . 08May2023 D
;
Now here is the step to fill in the missing LOCATION values.
data want;
merge have(in=in1)
have(in=in2
keep=id location rename=(location=first_loc)
where=(not missing(first_loc)))
;
by id ;
if first.id then previous_loc = first_loc;
Location=coalescec(Location,previous_loc);
output;
previous_loc=location;
retain previous_loc;
drop first_loc previous_loc;
run;
Result
Assignment OBS ID Date Hour Location Date expect 1 260 08MAY2023 0 A 07MAY2023 A 2 260 08MAY2023 1 A 07MAY2023 A 3 260 08MAY2023 2 A 07MAY2023 A 4 260 08MAY2023 3 A 07MAY2023 A 5 260 08MAY2023 4 B 10MAY2023 B 6 260 08MAY2023 5 B 10MAY2023 B 7 300 08MAY2023 0 C 03MAY2023 C 8 300 08MAY2023 1 C 03MAY2023 C 9 300 08MAY2023 2 C 03MAY2023 C 10 300 08MAY2023 3 C 03MAY2023 C 11 300 08MAY2023 4 D 08MAY2023 D 12 300 08MAY2023 5 D 08MAY2023 D
If you don't like the message about merge having duplicate observations for the same BY group then you can do it a double DOW loop instead.
data want;
do _n_=1 by 1 until(last.id);
set have ;
by id;
length previous_loc $8;
previous_loc=coalescec(previous_loc,location);
end;
do _n_=1 to _n_;
set have;
location=coalescec(location,previous_loc);
output;
previous_loc=location;
end;
drop previous_loc;
run;
Tom,
I really want to display my gratitude. You are awesome. I appreciate the time you took, and everyone else in this great community, to lend me a hand on finalizing this report. I can finally complete this and have it run on auto pilot. Have a great day.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.