I think you may need to provide some example data sets and the expected result(s).
Without an idea of what "two months of data" may look like this is a very imprecise description.
For example, you show 2 for the value of "months" on three different observations. Do you expect a different result for each observation? If so, how are we to tell which result? Are we to use this information to build a data set name to match with? A library? I really cannot tell how you expect "months" to be used as presented.
In 99.99 percent of cases it is usually better to work with date values instead of arbitrary numbers and calling them "months" (or weeks or quarters or what ever).
You need to provide more detail. My advice: Don't jump into any coding (especially macro coding) before you haven't clearly defined what you have and what result you want.
In your thinking: Would you create an output table per row in your driver data or something else. And what's "two months of data"? Are your source tables monthly files (tables) or would you need to sub-set the data. You would certainly need the information your your "driver table" (the 4 rows you shared) with all the information required to identify which source tables and sub-set of data to pick.
@Yychel wrote:
The main dataset that I share contains promotion information. The months variable refers to the length of the promotion. I want to read merge the main data with sales data. For example, since the length of the promotion for the first record is 2 months, I want to merge that against sales data 1 (sales data in month 1) and sales data 2 (sales data in month 2) with variable obs being the unique id. This would be a simple task if all promotions have the same length. However, the third record has a promotion that lasts 6 months, so I want to read sales data 1, 2, 3, 4, 5, and 6. Is there a way to do it systematically?
There is always a way to do things systematically.
Assuming you're actually after some code example you need to provide some as simple as possible while still representative sample data, explain the logic and show the desired outcome based on the sample data.
Please provide the sample data via SAS data step code that creates it.
If I understand right what you want then below should work.
data driver;
input id months;
datalines;
1 2
2 1
3 2
4 2
;
run;
data sales1;
input id sales;
datalines;
1 100
2 60
3 200
4 250
;
run;
data sales2;
input id sales;
datalines;
1 120
2 90
3 180
4 220
;
run;
data all_sales;
set sales: indsname=_ds;
months=input(compress(_ds,,'kd'),best32.);
run;
data want(drop=_:);
set driver;
if _n_=1 then
do;
if 0 then set all_sales(keep=sales);
dcl hash h1(dataset:'all_sales', multidata:'y');
h1.defineKey('id','months');
h1.defineData('sales');
h1.defineDone();
end;
do _i=1 to months;
if h1.find(key:id, key:_i) ne 0 then call missing(sales);
output;
end;
run;
proc print data=want;
run;
@Yychel Below two coding options using your sample data.
data v_all_sales / view=v_all_sales;
set sales: indsname=_ds;
sales_month=input(compress(_ds,,'kd'),best32.);
run;
/* option 1: hash lookup */
data want_1;
set driver;
if _n_=1 then
do;
if 0 then set v_all_sales(keep=sales);
dcl hash h1(dataset:'v_all_sales');
h1.defineKey('id','sales_month');
h1.defineData('sales');
h1.defineDone();
end;
do sales_month=1 to months;
if h1.find(key:id, key:sales_month) ne 0 then call missing(sales);
output;
end;
run;
/* option 2: SQL join */
proc sql;
create table want_2 as
select l.id, l.months, r.sales, r.sales_month
from driver l
left join v_all_sales r
on l.id=r.id
and r.sales_month<=l.months
;
quit;
proc print data=v_all_sales;
run;
proc print data=want_1;
run;
proc print data=want_2;
run;
Please note that with the logic as implemented the SQL left join will not return rows where there is not match to sales data while the hash lookup will. Just delete in your sample data one id in one or both of the sales tables to see the difference between the two approaches.
Thanks Patrick. The first option is probably more helpful because in the actual data, month ranges from 10-20 so the second method may be more complicated.
This is the final output that I am looking for so that there's only one observation for each unique id.
id months sales_1 sales_2
1 2 100 120
2 1 60
3 2 200 180
4 2 250 220
Depending on the size of you data you might be able to just convert that dataset into a multilabel format and then use PROC SUMMARY to summarize the data for multiple promotions at once.
So assuming you have sales data by MONTH. So data variables like:
data have;
input month store dept sales;
cards;
1 1 1 100
1 2 1 120
2 1 1 50
2 2 1 300
3 1 1 130
;
Then make a format that maps the set of months covered by a promotion to the same label. So something like:
proc format;
value promotion(multilabel);
1-2 = 'Promotion 1'
2-5 = 'Promotion 2'
;
run;
Now you can use PROC SUMMARY to produce totals for the different combinations of month values.
proc summary data=have completetypes;
class month / mlf preloadfmt ;
var sales;
output out=want sum=sales;
format month promotion.;
run;
Results
Obs month _TYPE_ _FREQ_ sales 1 0 5 700 2 Promotion 1 1 4 570 3 Promotion 2 1 3 480
@Yychel What Tom proposes using multilabel looks really interesting. You just would need to confirm if it's suitable for your data.
If you've got a table with your promotions then the format is something you don't need to type but you can generate it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.