Hi, is it possible to backfill the obs in row 1 , 2 and 7 with the customer_ID that became available in the later months?
data Have;
input customer_ID $ Na $ month $ cost;
datalines;
. Joe 202401 100
. Joe 202402 150
123 Joe 202403 200
123 Joe 202404 250
123 Joe 202405 300
123 Joe 202406 375
. Jane 202401 100
456 Jane 202402 175
456 Jane 202403 200
456 Jane 202404 275
456 Jane 202405 300
456 Jane 202406 375
;
run;
That's logically only possible if for each name there is exactly one matching ID value.
Even though "ugly" but because SAS tables are sequential it's also possible that your table is in a a sort order that we could always use the first non-blank ID for the same name.
You have to tell us what's the case.
data Have;
input customer_ID $ Na $ month : yymmn6. cost;
format month yymmn6.;
datalines;
. Joe 202401 100
. Joe 202402 150
123 Joe 202403 200
123 Joe 202404 250
123 Joe 202405 300
123 Joe 202406 375
. Jane 202401 100
456 Jane 202402 175
456 Jane 202403 200
456 Jane 202404 275
456 Jane 202405 300
456 Jane 202406 375
;
run;
proc sort data=have out=temp;
by Na descending month;
run;
data want;
length customer_ID $ 80;
set temp(rename=(customer_ID=_customer_ID));
by Na;
retain customer_ID ;
if first.Na then call missing(customer_ID);
if not missing(_customer_ID) then customer_ID=_customer_ID;
drop _customer_ID;
run;
proc sort data=want;by Na month;run;
If you have a big table,could try DOW skill:
data Have;
input customer_ID $ Na $ month : yymmn6. cost;
format month yymmn6.;
datalines;
. Joe 202401 100
. Joe 202402 150
123 Joe 202403 200
123 Joe 202404 250
123 Joe 202405 300
123 Joe 202406 375
. Jane 202401 100
456 Jane 202402 175
456 Jane 202403 200
456 Jane 202404 275
456 Jane 202405 300
456 Jane 202406 375
;
run;
data want;
if 0 then set have;
do until(not missing(customer_ID) or last.Na);
set have;
by Na notsorted;
end;
_customer_ID=customer_ID;
do until(not missing(customer_ID) or last.Na);
set have;
by Na notsorted;
output;
end;
drop customer_ID;
run;
No sorting is required. If each name has at least one non-missing customer_id, then
data Have;
input customer_ID $ Na $ month $ cost;
datalines;
. Joe 202401 100
. Joe 202402 150
123 Joe 202403 200
123 Joe 202404 250
123 Joe 202405 300
123 Joe 202406 375
. Jane 202401 100
456 Jane 202402 175
456 Jane 202403 200
456 Jane 202404 275
456 Jane 202405 300
456 Jane 202406 375
run;
data backfill/view=backfill;
set have (keep=customer_id na where=(customer_id^=.));
by na notsorted;
if last.na;
run;
data want;
set have (drop=customer_id);
by na notsorted;
if first.na then set backfill;
run;
This is two data steps, but only one retrieval of disk data, so should be very efficient.
And here an approach using a hash table lookup.
data Have;
input customer_ID Na $ month : yymmn6. cost;
format month yymmn6.;
datalines;
. Joe 202401 100
. Joe 202402 150
123 Joe 202403 200
123 Joe 202404 250
123 Joe 202405 300
123 Joe 202406 375
. Jane 202401 100
456 Jane 202402 175
456 Jane 202403 200
456 Jane 202404 275
456 Jane 202405 300
456 Jane 202406 375
;
run;
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(not missing(customer_id)))');
h1.defineKey('na');
h1.defineData('customer_id');
h1.defineDone();
end;
set have;
if missing(customer_id) then h1.find();
run;
It's very worthwhile getting up-to-speed with hash tables. Like with formats hash table lookups can be very efficient because they don't require sorting of the main table. Unlike with formats you can have a composite key and lookup multiple variables. You can also load multiple rows (items) per key and then iterate over them using the do_over() method.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.