BookmarkSubscribeRSS Feed
Scooby3g
Obsidian | Level 7

Hi, is it possible to backfill the obs in row 1 , 2  and 7 with the customer_ID that became available in the later months?

Scooby3g_0-1731363246229.png

data Have;
	input customer_ID $ Na $ month $ cost;
	datalines;
	.	Joe	202401 100
	.	Joe	202402 150
	123 Joe	202403 200
	123	Joe	202404 250
	123 Joe	202405 300
	123 Joe	202406 375
	.	Jane 202401 100
	456 Jane 202402 175
	456 Jane 202403 200
	456 Jane 202404 275
	456 Jane 202405 300
	456 Jane 202406 375
	;
run;
7 REPLIES 7
Patrick
Opal | Level 21

That's logically only possible if for each name there is exactly one matching ID value. 

Even though "ugly" but because SAS tables are sequential it's also possible that your table is in a a sort order that we could always use the first non-blank ID for the same name. 

 

You have to tell us what's the case.

Ksharp
Super User
data Have;
	input customer_ID $ Na $ month : yymmn6. cost;
	format month yymmn6.;
	datalines;
	.	Joe	202401 100
	.	Joe	202402 150
	123 Joe	202403 200
	123	Joe	202404 250
	123 Joe	202405 300
	123 Joe	202406 375
	.	Jane 202401 100
	456 Jane 202402 175
	456 Jane 202403 200
	456 Jane 202404 275
	456 Jane 202405 300
	456 Jane 202406 375
	;
run;
proc sort data=have out=temp;
by Na descending month;
run;
data want;
 length customer_ID $ 80;
 set temp(rename=(customer_ID=_customer_ID));
 by Na;
 retain customer_ID ;
 if first.Na then call missing(customer_ID);
 if not missing(_customer_ID) then customer_ID=_customer_ID;
 drop _customer_ID;
run;
proc sort data=want;by Na  month;run;
Ksharp
Super User

If you have a big table,could try DOW skill:

 

data Have;
	input customer_ID $ Na $ month : yymmn6. cost;
	format month yymmn6.;
	datalines;
	.	Joe	202401 100
	.	Joe	202402 150
	123 Joe	202403 200
	123	Joe	202404 250
	123 Joe	202405 300
	123 Joe	202406 375
	.	Jane 202401 100
	456 Jane 202402 175
	456 Jane 202403 200
	456 Jane 202404 275
	456 Jane 202405 300
	456 Jane 202406 375
	;
run;

data want;
if 0 then set have;
 do until(not missing(customer_ID) or last.Na);
  set have;
  by Na notsorted;
 end;
 _customer_ID=customer_ID;
 do until(not missing(customer_ID) or last.Na);
  set have;
  by Na notsorted;
  output;
 end;
drop customer_ID;
run;
mkeintz
PROC Star

No sorting is required.  If each name has at least one non-missing customer_id, then

 

data Have;
	input customer_ID $ Na $ month $ cost;
	datalines;
	.	Joe	202401 100
	.	Joe	202402 150
	123 Joe	202403 200
	123	Joe	202404 250
	123 Joe	202405 300
	123 Joe	202406 375
	.	Jane 202401 100
	456 Jane 202402 175
	456 Jane 202403 200
	456 Jane 202404 275
	456 Jane 202405 300
	456 Jane 202406 375
run;
data backfill/view=backfill;
  set have (keep=customer_id na where=(customer_id^=.));
  by na notsorted;
  if last.na;
run;

data want;
  set have (drop=customer_id);
  by na notsorted;
  if first.na then set backfill;
run;

This is two data steps, but only one retrieval of disk data, so should be very efficient.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

And here an approach using a hash table lookup.

data Have;
  input customer_ID Na $ month : yymmn6. cost;
  format month yymmn6.;
  datalines;
. Joe 202401 100
. Joe 202402 150
123 Joe 202403 200
123 Joe 202404 250
123 Joe 202405 300
123 Joe 202406 375
. Jane 202401 100
456 Jane 202402 175
456 Jane 202403 200
456 Jane 202404 275
456 Jane 202405 300
456 Jane 202406 375
;
run;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(not missing(customer_id)))');
      h1.defineKey('na');
      h1.defineData('customer_id');
      h1.defineDone();
    end;
  set have;
  if missing(customer_id) then h1.find();
run;
quickbluefish
Barite | Level 11
This is really interesting - almost like creating a name-to-customer ID lookup format and then applying it, but much more compact. Almost 20 years of using SAS and I think I've used a hash table maybe twice.
Patrick
Opal | Level 21

@quickbluefish 

It's very worthwhile getting up-to-speed with hash tables. Like with formats hash table lookups can be very efficient because they don't require sorting of the main table. Unlike with formats you can have a composite key and lookup multiple variables. You can also load multiple rows (items) per key and then iterate over them using the do_over() method. 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1945 views
  • 3 likes
  • 5 in conversation