BookmarkSubscribeRSS Feed
neo_noone
Calcite | Level 5

I am quite a novice in programming and I kind need your help regarding an issue I notice.

I have a table:

date  ID  secondary  D  E
date1 b1   c1          d1 e1
date1 b    c           d  e
date3 b2   c1          d3 e3
date4 b2   c2          d4 e4

Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful

 

17 REPLIES 17
Shmuel
Garnet | Level 18

The statemnt "BY ID secondary" is unique if the combination of the by variables is unique.

In your example: 'b1 c1',  'b2 c1',  'b2 c2' - these are all unique values.

 

In order to sum D column it should be numeric otherwise it has no sense;

 

 

PeterClemmensen
Tourmaline | Level 20

Here's how I interpret your question: In the first row, you want Id to equal Secondary. So you change either Id to 'c1' or Secondary to 'b1'. Then you do that for all obs, so that if ID = Secondary = 'b1' for example, then no other obs can have ID = 'b1' or Secondary = 'B1', correct?

 

I don't understand the summation of the D Columns though. Help me here?

neo_noone
Calcite | Level 5
YESSS 🙂 ... correct thank you. And then I want to sum column d for each (unique) ID . Thank you PeterC
PeterClemmensen
Tourmaline | Level 20

But D is a character column?

neo_noone
Calcite | Level 5

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the frist date and the same secondary ID as the first date ( or any day just to be cnsistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

 

date,             ID,        secondary ID,     expenses
jul2020           258           0004               1000
jul2020           xxx           xxxx                 xxx
......              ....               ....                 .....
......              ....               ....                   .....
aug2020         258             0008               2000
aug2020         xxx             xxxx                xxx
aug2020         500              0004             1000
Shmuel
Garnet | Level 18

If I understand you correctly you want to summarize the expences per user=ID.

That can be done either by proc summary (or peoc means, same syntax):

proc summary data=have noprint;
  calss ID;   /* or ID secondary ? */
  var expenses;
  output out=want sum=;
run;

or by sql

proc sql;
   select id, 
        sum(expenses)
   from have
   group by ID;   /* or id, secodary */
quit;

 

neo_noone
Calcite | Level 5

yeah but if I do that and don't take into account that the ID for the same customer is changing for some months wouldn't I loose some sums? 

Shmuel
Garnet | Level 18

To summarize per ID per month you need to add the date with appropriate format to the class or by statemnet.

Using proc summary it could be: by ID date; format date yymmn6.;

andreas_lds
Jade | Level 19

@neo_noone wrote:

I am quite a novice in programming and I kind need your help regarding an issue I notice.

I have a table:

date  ID  secondary  D  E
date1 b1   c1          d1 e1
date1 b    c           d  e
date3 b2   c1          d3 e3
date4 b2   c2          d4 e4

Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful

 


The table has zero numeric variables (except for date, maybe), so there seems to be nothing that can be summed. Please post the data you have in usable form (a data step using datalines) and show the expected output for that dataset.

neo_noone
Calcite | Level 5

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

 

date,             ID,        secondary ID,     expenses
jul2020           258          0004               1000
jul2020           xxx           xxx               xxx
…        ….               ….              …..
aug2020         258             0008               2000
aug2020         xxx             xxx               xxx
aug2020         500             0004             1000

 

PaigeMiller
Diamond | Level 26

@neo_noone wrote:

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

 

date,             ID,        secondary ID,     expenses
jul2020           258          0004               1000
jul2020           xxx           xxx               xxx
…        ….               ….              …..
aug2020         258             0008               2000
aug2020         xxx             xxx               xxx
aug2020         500             0004             1000

 


Please fix this table, so that there are no xxx, there are actual numbers in the columns (even if you have to make up some numbers), and then show us the resulting table you want. Actually do the math for this small table, sum the columns and show us the desired output.

--
Paige Miller
neo_noone
Calcite | Level 5

thank you for trying 

lets say this is the original table

datefirst_idsecond_idexpenses 
jul202040001011000
jul2020500892001
aug202040002021500
aug2020500892001
sep202040201013200
sep2020500892001
oct20204030101200
oct2020500892001
nov20204000202100
nov2020500892001

If you notice customer 400 has second ID 0101. But during the years the first ID or the second ID changes (not both at the same time). I want the table to become something like that

datefirst_idsecond_idexpenses 
jul202040001011000
jul2020500892001
aug202040001011500
aug2020500892001
sep202040001013200
sep2020400892001
oct20204000101200
oct2020500892001
nov20204000101100
nov2020500892001

now the customer with first ID 400 and second 0101 keeps it's original values thought the year.

After that I want to aggregate the expenses it by first ID ( I can do that) I just can't change the first ID or the second ID

PaigeMiller
Diamond | Level 26

So the step is to change 0202 under SECOND_ID to 0101? That's the only change? Or is it more general than that where sometimes 0404 change to 0101 as well? Or 3333 changes to 0237?

--
Paige Miller
neo_noone
Calcite | Level 5

YESSS and if the first ID is different than the original ( if it is, then the second ID will be the same as the original ID if you notice) change it to the original first ID

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2162 views
  • 0 likes
  • 5 in conversation