I have a data set with dates, and a product category, and I have a number of rows in the data set which have dates in the future, and so the product column is blank for those, but i'd like to essentially replicate the product data throughout those rows. i.e. sample of my data looks like this now:
12/31/2012 | product1 |
12/31/2012 | product2 |
12/31/2012 | product3 |
12/31/2012 | product4 |
12/31/2012 | product5 |
12/31/2012 | product6 |
12/31/2012 | product7 |
12/31/2012 | product8 |
12/31/2012 | product9 |
12/31/2012 | product10 |
12/31/2012 | product11 |
12/31/2012 | product12 |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
01/31/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 | |
02/28/2013 |
And I would like it to look like this:
12/31/2012 | product1 |
12/31/2012 | product2 |
12/31/2012 | product3 |
12/31/2012 | product4 |
12/31/2012 | product5 |
12/31/2012 | product6 |
12/31/2012 | product7 |
12/31/2012 | product8 |
12/31/2012 | product9 |
12/31/2012 | product10 |
12/31/2012 | product11 |
12/31/2012 | product12 |
01/31/2013 | product1 |
01/31/2013 | product2 |
01/31/2013 | product3 |
01/31/2013 | product4 |
01/31/2013 | product5 |
01/31/2013 | product6 |
01/31/2013 | product7 |
01/31/2013 | product8 |
01/31/2013 | product9 |
01/31/2013 | product10 |
01/31/2013 | product11 |
01/31/2013 | product12 |
02/28/2013 | product1 |
02/28/2013 | product2 |
02/28/2013 | product3 |
02/28/2013 | product4 |
02/28/2013 | product5 |
02/28/2013 | product6 |
02/28/2013 | product7 |
02/28/2013 | product8 |
02/28/2013 | product9 |
02/28/2013 | product10 |
02/28/2013 | product11 |
02/28/2013 | product12 |
etc.
I thought I could use the lag statement, in a way such as this:
data want;
set have;
if missing(prodcat) then prodcat = lag12(prodcat);
run;
Since the product categories repeat every 12 observations, but this data step has absolutely no effect on the data set, if i replace the lag function with just a constant string, I see that string populated throughout the rows -- so why doesn't the lag function put the data in properly?
Thanks!
-Ryan
Yes, that's correct. Very tricky ... LAG processes the ORIGINAL value in the "stack", not the final changed value. Here's another approach:
proc sql noprint;
create table dates as select distinct date from have;
create table products as select distinct product from have where product > ' ';
create table want as select * from dates, products order by date, product;
quit;
Good luck.
You normally do NOT want to reference LAG() conditionally. It does not "look back", instead it is popping values off a stack generated by the previous EXECUTIONS of the LAG() function.
lag12=lag12(prodcat);
if missing(prodcat) then prodcat = lag12;
Thanks so much! However it only partially worked...With your code modification it now populates the following 12 rows (01/31/2013) months, but none after that. I tried a retain statement with no luck...Any suggestions?
Yes, that's correct. Very tricky ... LAG processes the ORIGINAL value in the "stack", not the final changed value. Here's another approach:
proc sql noprint;
create table dates as select distinct date from have;
create table products as select distinct product from have where product > ' ';
create table want as select * from dates, products order by date, product;
quit;
Good luck.
Thank you so much Astounding! That works perfectly and seems to be a very elegant solution.
Many thanks!!
It sounds like your actual problem is not related to LAG() at all.
Do you want to assign all possible productions to all possible dates? then the SQL solution proposed looks good.
If you really did want to "cycle" the non-missing values you might be able to use POINT=.
data products;
set have;
where not missing(product);
keep product;
run;
data want;
set have ;
if product = ' ' then do ;
p+1;
if p > nobs then p=1;
set products point=p nobs=nobs;
end;
run;
You could always try a cross join instead:
proc sql;
create table want as
select * from (
select distinct date from have
cross join
select distinct product from have)
order by date, product;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.