- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am quite a novice in programming and I kind need your help regarding an issue I notice.
I have a table:
date ID secondary D E
date1 b1 c1 d1 e1
date1 b c d e
date3 b2 c1 d3 e3
date4 b2 c2 d4 e4
Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The statemnt "BY ID secondary" is unique if the combination of the by variables is unique.
In your example: 'b1 c1', 'b2 c1', 'b2 c2' - these are all unique values.
In order to sum D column it should be numeric otherwise it has no sense;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's how I interpret your question: In the first row, you want Id to equal Secondary. So you change either Id to 'c1' or Secondary to 'b1'. Then you do that for all obs, so that if ID = Secondary = 'b1' for example, then no other obs can have ID = 'b1' or Secondary = 'B1', correct?
I don't understand the summation of the D Columns though. Help me here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But D is a character column?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the frist date and the same secondary ID as the first date ( or any day just to be cnsistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.
date, ID, secondary ID, expenses jul2020 258 0004 1000 jul2020 xxx xxxx xxx ...... .... .... ..... ...... .... .... ..... aug2020 258 0008 2000 aug2020 xxx xxxx xxx aug2020 500 0004 1000
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I understand you correctly you want to summarize the expences per user=ID.
That can be done either by proc summary (or peoc means, same syntax):
proc summary data=have noprint;
calss ID; /* or ID secondary ? */
var expenses;
output out=want sum=;
run;
or by sql
proc sql;
select id,
sum(expenses)
from have
group by ID; /* or id, secodary */
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
yeah but if I do that and don't take into account that the ID for the same customer is changing for some months wouldn't I loose some sums?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To summarize per ID per month you need to add the date with appropriate format to the class or by statemnet.
Using proc summary it could be: by ID date; format date yymmn6.;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@neo_noone wrote:
I am quite a novice in programming and I kind need your help regarding an issue I notice.
I have a table:
date ID secondary D E date1 b1 c1 d1 e1 date1 b c d e date3 b2 c1 d3 e3 date4 b2 c2 d4 e4
Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful
The table has zero numeric variables (except for date, maybe), so there seems to be nothing that can be summed. Please post the data you have in usable form (a data step using datalines) and show the expected output for that dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.
date, ID, secondary ID, expenses
jul2020 258 0004 1000
jul2020 xxx xxx xxx
… …. …. …..
aug2020 258 0008 2000
aug2020 xxx xxx xxx
aug2020 500 0004 1000
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@neo_noone wrote:
no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.
date, ID, secondary ID, expenses jul2020 258 0004 1000 jul2020 xxx xxx xxx … …. …. ….. aug2020 258 0008 2000 aug2020 xxx xxx xxx aug2020 500 0004 1000
Please fix this table, so that there are no xxx, there are actual numbers in the columns (even if you have to make up some numbers), and then show us the resulting table you want. Actually do the math for this small table, sum the columns and show us the desired output.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you for trying
lets say this is the original table
date | first_id | second_id | expenses |
jul2020 | 400 | 0101 | 1000 |
jul2020 | 500 | 89 | 2001 |
aug2020 | 400 | 0202 | 1500 |
aug2020 | 500 | 89 | 2001 |
sep2020 | 402 | 0101 | 3200 |
sep2020 | 500 | 89 | 2001 |
oct2020 | 403 | 0101 | 200 |
oct2020 | 500 | 89 | 2001 |
nov2020 | 400 | 0202 | 100 |
nov2020 | 500 | 89 | 2001 |
If you notice customer 400 has second ID 0101. But during the years the first ID or the second ID changes (not both at the same time). I want the table to become something like that
date | first_id | second_id | expenses |
jul2020 | 400 | 0101 | 1000 |
jul2020 | 500 | 89 | 2001 |
aug2020 | 400 | 0101 | 1500 |
aug2020 | 500 | 89 | 2001 |
sep2020 | 400 | 0101 | 3200 |
sep2020 | 400 | 89 | 2001 |
oct2020 | 400 | 0101 | 200 |
oct2020 | 500 | 89 | 2001 |
nov2020 | 400 | 0101 | 100 |
nov2020 | 500 | 89 | 2001 |
now the customer with first ID 400 and second 0101 keeps it's original values thought the year.
After that I want to aggregate the expenses it by first ID ( I can do that) I just can't change the first ID or the second ID
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So the step is to change 0202 under SECOND_ID to 0101? That's the only change? Or is it more general than that where sometimes 0404 change to 0101 as well? Or 3333 changes to 0237?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
YESSS and if the first ID is different than the original ( if it is, then the second ID will be the same as the original ID if you notice) change it to the original first ID