Create a variable outside of data step based on a variable inside the ...

Yychel · Posted 05-28-2024 05:17 PM

Obs months
1 2
2 2
3 6
4 2

I want to merge the dataset above with other datasets based on value of months variable. For example, I want to merge two months of data for the first record, two months of data for the second record, 6 months of data for the third record, and 2 months of records for the fourth record. Is there a systematic way of doing it? I was thinking about creating a variable based on the months variable and create a loop within a macro to achieve it.

ballardw · Posted 05-28-2024 05:52 PM

I think you may need to provide some example data sets and the expected result(s).

Without an idea of what "two months of data" may look like this is a very imprecise description.

For example, you show 2 for the value of "months" on three different observations. Do you expect a different result for each observation? If so, how are we to tell which result? Are we to use this information to build a data set name to match with? A library? I really cannot tell how you expect "months" to be used as presented.

In 99.99 percent of cases it is usually better to work with date values instead of arbitrary numbers and calling them "months" (or weeks or quarters or what ever).

Patrick · Posted 05-28-2024 09:18 PM

You need to provide more detail. My advice: Don't jump into any coding (especially macro coding) before you haven't clearly defined what you have and what result you want.

In your thinking: Would you create an output table per row in your driver data or something else. And what's "two months of data"? Are your source tables monthly files (tables) or would you need to sub-set the data. You would certainly need the information your your "driver table" (the 4 rows you shared) with all the information required to identify which source tables and sub-set of data to pick.

Yychel · Posted 05-28-2024 10:36 PM

The main dataset that I share contains promotion information. The months variable refers to the length of the promotion. I want to read merge the main data with sales data. For example, since the length of the promotion for the first record is 2 months, I want to merge that against sales data 1 (sales data in month 1) and sales data 2 (sales data in month 2) with variable obs being the unique id. This would be a simple task if all promotions have the same length. However, the third record has a promotion that lasts 6 months, so I want to read sales data 1, 2, 3, 4, 5, and 6. Is there a way to do it systematically?

Patrick · Posted 05-28-2024 10:45 PM

@Yychel wrote:
The main dataset that I share contains promotion information. The months variable refers to the length of the promotion. I want to read merge the main data with sales data. For example, since the length of the promotion for the first record is 2 months, I want to merge that against sales data 1 (sales data in month 1) and sales data 2 (sales data in month 2) with variable obs being the unique id. This would be a simple task if all promotions have the same length. However, the third record has a promotion that lasts 6 months, so I want to read sales data 1, 2, 3, 4, 5, and 6. Is there a way to do it systematically?

@Yychel

There is always a way to do things systematically.

Assuming you're actually after some code example you need to provide some as simple as possible while still representative sample data, explain the logic and show the desired outcome based on the sample data.

Please provide the sample data via SAS data step code that creates it.

Yychel · Posted 05-28-2024 11:24 PM

Below are my sample codes. I changed value for the second observation in the base dataset for simplicity.

data dataset;
input id months;
datalines;
1 2
2 1
3 2
4 2
;
run;

data sales1;
input id sales;
datalines;
1 100
2 60
3 200
4 250
;
run;

data sales2;
input id sales;
datalines;
1 120
2 90
3 180
4 220
;
run;

data merge;
merge dataset (in=a)
sales1 (rename=(sales=sales_1)
sales2 (rename=(sales=sales_2);
by id;
If a;
run;

Data merge doesn’t work. Since id =2 has months (promo period) =1, I don’t want to read sales from sales2. The idea that I have is to create a variable based on value of “months” from the base data and merge sales data based on it (i.e. merge sales1-sales2 for months = 2)

Patrick · Posted 05-28-2024 11:51 PM

If I understand right what you want then below should work.

data driver;
  input id months;
  datalines;
1 2
2 1
3 2
4 2
;
run;

data sales1;
  input id sales;
  datalines;
1 100
2 60
3 200
4 250
;
run;

data sales2;
  input id sales;
  datalines;
1 120
2 90
3 180
4 220
;
run;

data all_sales;
  set sales: indsname=_ds;
  months=input(compress(_ds,,'kd'),best32.);
run;

data want(drop=_:);
  set driver;
  if _n_=1 then
    do;
      if 0 then set all_sales(keep=sales);
      dcl hash h1(dataset:'all_sales', multidata:'y');
      h1.defineKey('id','months');
      h1.defineData('sales');
      h1.defineDone();
    end;
  do _i=1 to months;
    if h1.find(key:id, key:_i) ne 0 then call missing(sales);
    output;
  end;
run;
proc print data=want;
run;

Patrick · Posted 05-29-2024 12:13 AM

@Yychel Below two coding options using your sample data.

data v_all_sales / view=v_all_sales;
  set sales: indsname=_ds;
  sales_month=input(compress(_ds,,'kd'),best32.);
run;

/* option 1: hash lookup */
data want_1;
  set driver;
  if _n_=1 then
    do;
      if 0 then set v_all_sales(keep=sales);
      dcl hash h1(dataset:'v_all_sales');
      h1.defineKey('id','sales_month');
      h1.defineData('sales');
      h1.defineDone();
    end;
  do sales_month=1 to months;
    if h1.find(key:id, key:sales_month) ne 0 then call missing(sales);
    output;
  end;
run;
/* option 2: SQL join */
proc sql;
  create table want_2 as
  select l.id, l.months, r.sales, r.sales_month
  from driver l
  left join v_all_sales r
    on l.id=r.id 
       and r.sales_month<=l.months
  ;
quit;
proc print data=v_all_sales;
run;
proc print data=want_1;
run;
proc print data=want_2;
run;

Please note that with the logic as implemented the SQL left join will not return rows where there is not match to sales data while the hash lookup will. Just delete in your sample data one id in one or both of the sales tables to see the difference between the two approaches.

Yychel · Posted 05-29-2024 10:15 AM

Thanks Patrick. The first option is probably more helpful because in the actual data, month ranges from 10-20 so the second method may be more complicated.

This is the final output that I am looking for so that there's only one observation for each unique id.

id months sales_1 sales_2
1 2 100 120
2 1 60
3 2 200 180
4 2 250 220

Patrick · Posted 05-29-2024 12:00 PM

Use proc transpose on the already created want table

Tom · Posted 05-29-2024 02:47 PM

Depending on the size of you data you might be able to just convert that dataset into a multilabel format and then use PROC SUMMARY to summarize the data for multiple promotions at once.

So assuming you have sales data by MONTH. So data variables like:

data have;
  input month store dept sales;
cards;
1 1 1 100
1 2 1 120
2 1 1 50
2 2 1 300
3 1 1 130
;

Then make a format that maps the set of months covered by a promotion to the same label. So something like:

proc format;
value promotion(multilabel);
  1-2 = 'Promotion 1'
  2-5 = 'Promotion 2'
;
run;

Now you can use PROC SUMMARY to produce totals for the different combinations of month values.

proc summary data=have completetypes;
  class month / mlf preloadfmt ;
  var sales;
  output out=want sum=sales;
  format month promotion.;
run;

Results

Obs       month       _TYPE_    _FREQ_    sales

 1                       0         5       700
 2     Promotion 1       1         4       570
 3     Promotion 2       1         3       480

Patrick · Posted 05-29-2024 09:09 PM

@Yychel What Tom proposes using multilabel looks really interesting. You just would need to confirm if it's suitable for your data.

If you've got a table with your promotions then the format is something you don't need to type but you can generate it.

Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Re: Create a variable outside of data step based on a variable inside the data step

Register Today!

SAS Training: Just a Click Away