SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Mercadja
Fluorite | Level 6

Hello community,

I am attempting to copy data into blank rows based on the ID date and hour. I am having trouble getting this to apply based on examples I have seen on this forum. My extra layer of difficulty comes from each ID could need two different locations copied to it. Some example data below...

Obs     ID   Date             Hour          Location          Assignment Date
1       260     08May2023  0                                          07May2023
2      260     08May2023   1                  A                     07May2023
3      260     08May2023  2                  A                     07May2023
4      260     08May2023  3                                          07May2023
..........................................................................................................
15     260      08May2023 4                   B                    10May2023
16     260      08May2023 5                                          10May2023
17     300      08May2023 0                                         03May2023
18     300      08May2023 1                                          03May2023
19     300      08May2023 2                   C                   03May2023
20    300      08May2023 3                   C                   03May2023
21     300      08May2023 4                   D                   08May2023
22    300      08May2023 5                                          08May2023

The code I have used is this

proc sort data = have;
by ID Date Hour;
run;

data want;set have;retain pcol1;if missing(Location) then Location= pcol1;pcol1 = Location;drop pcol:;run;

The problem with this code is observation 17 & 18 will copy above making it B when it should be C. Same for observation 1 if it has data above it. Also you can see there are occurrences where one ID will have two locations in the dataset (Which is done by week). Most will be in one location the entire week but it does happen where an ID can be in one location then have a gap and then reappear at a new location. The code I have tried above works except when there is an observation that has location blank above it.
What code can be used to copy the cells with data for location into the cells missing location by ID and Assignment date in order of the date and hours. I say in order because the new assignment date is when you know the location has changed.
For clarity this is how the date should fill in:

Obs     ID   Date             Hour          Location          Assignment Date
1       260     08May2023  0                 A                      07May2023
2      260     08May2023   1                  A                     07May2023
3      260     08May2023  2                  A                     07May2023
4      260     08May2023  3                  A                        07May2023
..........................................................................................................
15     260      08May2023 4                   B                    10May2023
16     260      08May2023 5                   B                       10May2023
17     300      08May2023 0                   C                    03May2023
18     300      08May2023 1                     C                     03May2023
19     300      08May2023 2                   C                    03May2023
20    300      08May2023 3                   C                    03May2023
21     300      08May2023 4                  D                    08May2023
22    300      08May2023 5                   D                       08May2023

Thank you in advance for anyone able to help me with this.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So LOCF (last observation carried forward) with the added wrinkle of also wanting the first non-empty observations to be pushed backwards.

 

Seems to me this same question was just asked earlier this week.  

One way is to merge the data with a version of the data that only has non-missing location values so you can get an initial value for the carried forward value.

 

First let's make some actual test data (with expected results).

data have ;
  input ID  Date :date. Hour  Location $ AssignmentDate :date. expect $;
  format Date AssignmentDate date9. ;
  cards ;
260 08May2023 0 .  07May2023 A
260 08May2023 1 A  07May2023 A
260 08May2023 2 A  07May2023 A
260 08May2023 3 .  07May2023 A
260 08May2023 4 B  10May2023 B
260 08May2023 5 .  10May2023 B
300 08May2023 0 .  03May2023 C
300 08May2023 1 .  03May2023 C
300 08May2023 2 C  03May2023 C
300 08May2023 3 C  03May2023 C
300 08May2023 4 D  08May2023 D
300 08May2023 5 .  08May2023 D
;

Now here is the step to fill in the missing LOCATION values.

data want;
  merge have(in=in1) 
        have(in=in2 
             keep=id location rename=(location=first_loc) 
             where=(not missing(first_loc)))
  ;
  by id ;
  if first.id then previous_loc = first_loc;
  Location=coalescec(Location,previous_loc);
  output;
  previous_loc=location;
  retain previous_loc;
  drop first_loc previous_loc;
run;

Result

                                               Assignment
OBS     ID         Date    Hour    Location       Date       expect

  1    260    08MAY2023      0        A        07MAY2023       A
  2    260    08MAY2023      1        A        07MAY2023       A
  3    260    08MAY2023      2        A        07MAY2023       A
  4    260    08MAY2023      3        A        07MAY2023       A
  5    260    08MAY2023      4        B        10MAY2023       B
  6    260    08MAY2023      5        B        10MAY2023       B
  7    300    08MAY2023      0        C        03MAY2023       C
  8    300    08MAY2023      1        C        03MAY2023       C
  9    300    08MAY2023      2        C        03MAY2023       C
 10    300    08MAY2023      3        C        03MAY2023       C
 11    300    08MAY2023      4        D        08MAY2023       D
 12    300    08MAY2023      5        D        08MAY2023       D

If you don't like the message about merge having duplicate observations for the same BY group then you can do it a double DOW loop instead.

data want;
do _n_=1 by 1 until(last.id);
  set have ;
  by id;
  length previous_loc $8;
  previous_loc=coalescec(previous_loc,location);
end;
do _n_=1 to _n_;
  set have;
  location=coalescec(location,previous_loc);
  output;
  previous_loc=location;
end;
drop previous_loc;
run;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

May I ask why you seem to want a blank row in your data set? What is the benefit?

--
Paige Miller
Mercadja
Fluorite | Level 6

Hello, I am looking to fill in those blank rows with the appropriate location. I insert those blank observations as everyone in my dataset needs an observation for every hour throughout the week. The report I receive only gives an observation if their was an observation to report. Thus, if an observation was missed my code inserts the missing hour(s). When the row is inserted by ID it produces these blanks I then need to fill with the appropriate location. 

PaigeMiller
Diamond | Level 26

Okay but that does not answer my question, what is the benefit? This is just a repeat of you saying that there needs to be blank rows inserted, said using more words.

--
Paige Miller
Quentin
Super User

Hi, 

It's a bit hard to understand your data and logic, because the formatting is messed up.  In order to preserve formatting for code or text output, you need to use the "insert code" button.  

I think you have data like:

ID  Date      Hour Location AssignmentDate
260 08May2023 0    .        07May2023
260 08May2023 1    A        07May2023
260 08May2023 2    A        07May2023
260 08May2023 3    .        07May2023
260 08May2023 4    B        10May2023
260 08May2023 5    .        10May2023
300 08May2023 0    .        03May2023
300 08May2023 1    .        03May2023
300 08May2023 2    C        03May2023
300 08May2023 3    C        03May2023
300 08May2023 4    D        08May2023
300 08May2023 5    .        08May2023

And you want to fill in the missing values for location, to generate a WANT dataset like:

ID  Date      Hour Location AssignmentDate
260 08May2023 0    A        07May2023
260 08May2023 1    A        07May2023
260 08May2023 2    A        07May2023
260 08May2023 3    A        07May2023
260 08May2023 4    B        10May2023
260 08May2023 5    B        10May2023
300 08May2023 0    C        03May2023
300 08May2023 1    C        03May2023
300 08May2023 2    C        03May2023
300 08May2023 3    C        03May2023
300 08May2023 4    D        08May2023
300 08May2023 5    D        08May2023

Is that right?  

What is the rule for filling in the missing values for location?   Is there always only one location value per ID-AssignmentDate?  

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Mercadja
Fluorite | Level 6

Hello Quentin,

 

Thank you for the reply. My formatting did get messed up as it the post was flagged for spam and I had to get customer service to repost. 

 

I can give further background information. This data is a weekly report I receive to which I begin working with. If an observation occurred in a given hour then it shows in my report. If it does not then I have code that inserts the observation so when I do a means they contribute 0 to the set for the given day and hour. 

The reason blanks exist is if there is no observation in the report I receive and I add it in for them to contribute data the Location and even Assignment date will be blank. I am trying to fill in the blank spaces that I inserted so they can be correctly categorized for the final count. The code I use works but if someone misses hour 0 on the first day of the report there is nothing for them to repeat down and it inherits the location from the previous ID.

The location can change for an ID as well in the middle of the week. If that occurs the code needs to account for this change. There would be a new Assignment date at the Date and Hour that the new assignment began and then I would like to start to carry down the new location based on the change, like you see for ID 300. Is there anyway I can get this resolved without manually having to go through?

Quentin
Super User

Could you ever have two different locations for the same assignment date?  Could you have data like:

ID  Date      Hour Location AssignmentDate
999 08May2023 0    .        07May2023
999 08May2023 1    A        07May2023
999 08May2023 2    .        07May2023
999 08May2023 3    B        07May2023

?

Or is it impossible to have two different Location values for the same ID-assignment?

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Quentin
Super User

Assuming an ID can only have one location per assignment date, and there is always at least one location value present in the data you have you could use BY-group processing similar to your current code, but sorting by ID AssignmentDate and Location (descending), so that the non-missing location will be the first record in each group.

 

data have ;
  input ID : 8. Date :date9. Hour :1. Location :$1. AssignmentDate :date9. ;
  format Date AssignmentDate date9. ;
  cards ;
260 08May2023 0    .        07May2023
260 08May2023 1    A        07May2023
260 08May2023 2    A        07May2023
260 08May2023 3    .        07May2023
260 08May2023 4    B        10May2023
260 08May2023 5    .        10May2023
300 08May2023 0    .        03May2023
300 08May2023 1    .        03May2023
300 08May2023 2    C        03May2023
300 08May2023 3    C        03May2023
300 08May2023 4    D        08May2023
300 08May2023 5    .        08May2023
;
run ;

proc sort data=have ;
  by ID AssignmentDate descending Location ;
run ;

data want ;
  set have ;
  by ID AssignmentDate ;
  retain _loc ;
  if first.AssignmentDate then _loc=location ;
  if missing(location) then location=_loc ;
  drop _: ;
run ;

proc sort data=want ;
  by ID Date Hour;
run ;
The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Mercadja
Fluorite | Level 6

Thank you for working on this. Unfortunately it is possible for one ID to have two different locations in the report. If not I would be golden. The assignment Date variable indicates the new date of when that occurred and will have the new location attached to it. The code will need to copy the location to all dates below it and above if needed. Then it would need to stop adding that location once the new assignment date began. When the data is sorted by Date and Hour the new start date would be at the hour and date when the new location has begun. 

Tom
Super User Tom
Super User

So LOCF (last observation carried forward) with the added wrinkle of also wanting the first non-empty observations to be pushed backwards.

 

Seems to me this same question was just asked earlier this week.  

One way is to merge the data with a version of the data that only has non-missing location values so you can get an initial value for the carried forward value.

 

First let's make some actual test data (with expected results).

data have ;
  input ID  Date :date. Hour  Location $ AssignmentDate :date. expect $;
  format Date AssignmentDate date9. ;
  cards ;
260 08May2023 0 .  07May2023 A
260 08May2023 1 A  07May2023 A
260 08May2023 2 A  07May2023 A
260 08May2023 3 .  07May2023 A
260 08May2023 4 B  10May2023 B
260 08May2023 5 .  10May2023 B
300 08May2023 0 .  03May2023 C
300 08May2023 1 .  03May2023 C
300 08May2023 2 C  03May2023 C
300 08May2023 3 C  03May2023 C
300 08May2023 4 D  08May2023 D
300 08May2023 5 .  08May2023 D
;

Now here is the step to fill in the missing LOCATION values.

data want;
  merge have(in=in1) 
        have(in=in2 
             keep=id location rename=(location=first_loc) 
             where=(not missing(first_loc)))
  ;
  by id ;
  if first.id then previous_loc = first_loc;
  Location=coalescec(Location,previous_loc);
  output;
  previous_loc=location;
  retain previous_loc;
  drop first_loc previous_loc;
run;

Result

                                               Assignment
OBS     ID         Date    Hour    Location       Date       expect

  1    260    08MAY2023      0        A        07MAY2023       A
  2    260    08MAY2023      1        A        07MAY2023       A
  3    260    08MAY2023      2        A        07MAY2023       A
  4    260    08MAY2023      3        A        07MAY2023       A
  5    260    08MAY2023      4        B        10MAY2023       B
  6    260    08MAY2023      5        B        10MAY2023       B
  7    300    08MAY2023      0        C        03MAY2023       C
  8    300    08MAY2023      1        C        03MAY2023       C
  9    300    08MAY2023      2        C        03MAY2023       C
 10    300    08MAY2023      3        C        03MAY2023       C
 11    300    08MAY2023      4        D        08MAY2023       D
 12    300    08MAY2023      5        D        08MAY2023       D

If you don't like the message about merge having duplicate observations for the same BY group then you can do it a double DOW loop instead.

data want;
do _n_=1 by 1 until(last.id);
  set have ;
  by id;
  length previous_loc $8;
  previous_loc=coalescec(previous_loc,location);
end;
do _n_=1 to _n_;
  set have;
  location=coalescec(location,previous_loc);
  output;
  previous_loc=location;
end;
drop previous_loc;
run;
Mercadja
Fluorite | Level 6

Tom, 

I really want to display my gratitude. You are awesome. I appreciate the time you took, and everyone else in this great community, to lend me a hand on finalizing this report. I can finally complete this and have it run on auto pilot. Have a great day. 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2703 views
  • 2 likes
  • 4 in conversation