02-09-2012 06:49 AM
I have a data set as described below:
|donor id||donation amt||donation date||payment type||appeal id|
1.First, i want to merge data from another table (which has same donor id but with their age, city and gender) to the above
2.Then Create a table that has the total amount donated by each donor id; the first date of donation; last date of donation; the number of times they have donated(how many times they have donated from the first date of donation) exclude, payment type and appeal id from the output.
02-09-2012 07:03 AM
Lets call your two tables DONATIONS and DONORS.
Merge by donor id.
data details ;
merge donors donations ;
You can use PROC SUMMARY to calculate your statistics.
proc summary data=details ;
var donation_amt donation_date;
output out=summary sum(donation_amt)=total min(donation_date)=first max(donation_date)=last n=number ;
You might want to reverse the steps and get the summary first and then merge that with the donor list to get the name, city etc.
02-09-2012 07:20 AM
Hi Tom thanks for the helpful reply. I receive an error when i run the merge query, because some of the donor ids have letters in them.
"ERROR: variable donor id has been defined as both character and numeric"
How can i rectify this? thanks
02-09-2012 07:41 AM
Do proc contents for both datasets and find out which variable has a numeric value for Donar_id. Convert that variable in to charactor using PUT function.
specify the variable length same as the second dataset Donar_ID variable.
02-10-2012 03:38 AM
The ERROR information has shown you that the donor_id in these two datasets has different type.
data one; input id $ amt : dollar20. date : ddmmyy10. type & $20. appeal $; format date ddmmyy10.; cards; 001234 20 26/11/2005 direct debit HHU 001234 15 28/12/2006 cash HHI 001236 10 16/8/2007 direct debit HHJ ; run; data two; input id $ age city $ gender $; cards; 001234 23 nv f 001236 45 bs m ; run; proc sort data=one;by id date;run; data temp; set one; by id; retain n sum first_date; if first.id then do;first_date=date; n=0;sum=0;end; n+1;sum+amt; if last.id then do;last_date=date;output;end; format first_date last_date ddmmyy10.; drop amt type appeal date; run; proc sort data=two;by id;run; data want; merge temp two; by id; run;
02-14-2012 06:06 AM
Thank you Ksharp,
Just a couple of questions as i am not familiar with most of the proc statements.
First the example table i have used is just a snapshot of the whole data set. There are actually 375, 000 records of data so will the 'cards' function be applicable in this case? i'll like to use all the ids present in the data set.
Second when i try to run a proc summary statement, i receive an error to say the data is not sorted in ascending order of id.How can i rectify this?
02-14-2012 09:03 AM
Cards is just used to load some data entered within the code, this is n/a for your situation, just for the demo.
PROC SUMMARY with BY (and most other PROCs) assume that data is pre-sorted.
So, you can either sort your data first on the BY variable you wish to group by, or use CLASS instead (does not require input data to be sorted), or use PROC SQL GROUP BY.
02-14-2012 09:02 PM
LinusH has already respond your questions.
cards is only for demo. If you already have one two datasets at work library, then remove
these two cards. And don't forget make sure the type of id has the same type in these two datasets.