Hi,
I have two data sets:
Data Set 1 Variables: Description (unique), Organization_ID, and Submit_Date
Data Set 2 Variables: Organization_ID, Commit_Date, and Amount
Data looks like this:
DATASET1
Description Organization_ID Submit_Date (MMDDYY6.)
Customer was unhappy with product. A12 01052015
They completed the training and called for help B22 02012015
The customer asked for assistance with the product. B22 02202015
No long want to do business F02 03102015
DATASET2:
Organization_ID Commit_Date (MMDDYY6.) Amount
C50 05012015 258
D15 01122015 1589
A12 01102015 100
B22 02152015 200
B22 02162015 50
B22 10152015 879
For each Description, I want to find the total Amount for that Organization_ID, when the Submit_Date is within 6 months of the Commit_Date. In other words, for each Description, I want to see if/how much $ the submitting organization spent in the 6 months since submitting the description.
Output I want:
DATASET1 with this column appended at the end:
TotalAmount
100
250
250
0
My very messy way of doing it was a left join by Organization_ID (which blows up the data set size since an org can have many records in the second data set - this would become an issue if I increased my data universe), then filtered to only records where dates within 6 months. Then I summed the Amount by Organization_ID.
Is there a more concise way of doing this? I'd like to have a program that goes through each Description, then looks through the records in the second table to find those with the same org and a date within 6 months, then return with either a 0 or a summation of the Amounts for the matching records.
Any suggestions for certain procedures or methods I should look into?
Amanda
please show a sample of input and output data, someone will able to help you
Please post test data in the form of a datastep to illustrate what you have, and show what you want out.
One method maybe to generate the code from the first, something like:
data _null_; set ds1; call execute(cat('proc means data=ds2; where organization_id="',organization_id,'" and ',commit_date,' <= submit_date <= ',intnx('month',commit_date,6),'; output out=tmp sum=sum; run;'));
if _n_=1 then call execute('data want; set tmp; run;');
else call execute('data want; set want tmp; run;'); run;
This will sum each block and add it to the want dataset.
Thank you for your help. This was how I figured the 6 month window:
Time=INTCK('month',submit_date, commit_date);
Then created a new data set....
where 0<Time<6;
The submit_date has to occur before the commit_date. Is this same logic being used here?
and ',commit_date,' <= submit_date <= ',intnx('month',commit_date,6),';
Is it looking for commit dates that occur within 6 months of the submit dates, or vice versa?
Are you date variables actual SAS date values? Not character values that look like dates, not numeric values like 20180515 that mimic some form of date but a numeric value with hopefully a SAS date format.
The dates are actual dates. (I'm using SAS EG) The icon for the columns are calendars, and hoovering over the field name the diaglog box says "Type: Date"
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.