SAS Programming

neo_noone · Posted 09-30-2022 12:35 AM

I am quite a novice in programming and I kind need your help regarding an issue I notice.

I have a table:

date  ID  secondary  D  E
date1 b1   c1          d1 e1
date1 b    c           d  e
date3 b2   c1          d3 e3
date4 b2   c2          d4 e4

Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful

Shmuel · Posted 09-30-2022 01:15 AM

The statemnt "BY ID secondary" is unique if the combination of the by variables is unique.

In your example: 'b1 c1', 'b2 c1', 'b2 c2' - these are all unique values.

In order to sum D column it should be numeric otherwise it has no sense;

PeterClemmensen · Posted 09-30-2022 01:19 AM

Here's how I interpret your question: In the first row, you want Id to equal Secondary. So you change either Id to 'c1' or Secondary to 'b1'. Then you do that for all obs, so that if ID = Secondary = 'b1' for example, then no other obs can have ID = 'b1' or Secondary = 'B1', correct?

I don't understand the summation of the D Columns though. Help me here?

The DATA to DATA Step Macro
Blog: SASnrd

neo_noone · Posted 09-30-2022 02:34 AM

YESSS 🙂 ... correct thank you. And then I want to sum column d for each (unique) ID . Thank you PeterC

PeterClemmensen · Posted 09-30-2022 02:39 AM

But D is a character column?

The DATA to DATA Step Macro
Blog: SASnrd

neo_noone · Posted 09-30-2022 02:53 AM

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the frist date and the same secondary ID as the first date ( or any day just to be cnsistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

date,             ID,        secondary ID,     expenses
jul2020           258           0004               1000
jul2020           xxx           xxxx                 xxx
......              ....               ....                 .....
......              ....               ....                   .....
aug2020         258             0008               2000
aug2020         xxx             xxxx                xxx
aug2020         500              0004             1000

Shmuel · Posted 09-30-2022 03:33 AM

If I understand you correctly you want to summarize the expences per user=ID.

That can be done either by proc summary (or peoc means, same syntax):

proc summary data=have noprint;
  calss ID;   /* or ID secondary ? */
  var expenses;
  output out=want sum=;
run;

or by sql

proc sql;
   select id, 
        sum(expenses)
   from have
   group by ID;   /* or id, secodary */
quit;

neo_noone · Posted 09-30-2022 03:37 AM

yeah but if I do that and don't take into account that the ID for the same customer is changing for some months wouldn't I loose some sums?

Shmuel · Posted 09-30-2022 04:26 AM

To summarize per ID per month you need to add the date with appropriate format to the class or by statemnet.

Using proc summary it could be: by ID date; format date yymmn6.;

andreas_lds · Posted 09-30-2022 06:41 AM

@neo_noone wrote:

I am quite a novice in programming and I kind need your help regarding an issue I notice.

I have a table:
date  ID  secondary  D  E
date1 b1   c1          d1 e1
date1 b    c           d  e
date3 b2   c1          d3 e3
date4 b2   c2          d4 e4
Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful

The table has zero numeric variables (except for date, maybe), so there seems to be nothing that can be summed. Please post the data you have in usable form (a data step using datalines) and show the expected output for that dataset.

neo_noone · Posted 09-30-2022 07:14 AM

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

date,             ID,        secondary ID,     expenses
jul2020           258          0004               1000
jul2020           xxx           xxx               xxx
…        ….               ….              …..
aug2020         258             0008               2000
aug2020         xxx             xxx               xxx
aug2020         500             0004             1000

PaigeMiller · Posted 09-30-2022 07:30 AM

@neo_noone wrote:

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.
date,             ID,        secondary ID,     expenses
jul2020           258          0004               1000
jul2020           xxx           xxx               xxx
…        ….               ….              …..
aug2020         258             0008               2000
aug2020         xxx             xxx               xxx
aug2020         500             0004             1000

Please fix this table, so that there are no xxx, there are actual numbers in the columns (even if you have to make up some numbers), and then show us the resulting table you want. Actually do the math for this small table, sum the columns and show us the desired output.

--
Paige Miller

neo_noone · Posted 09-30-2022 07:55 AM

thank you for trying

lets say this is the original table

date	first_id	second_id	expenses
jul2020	400	0101	1000
jul2020	500	89	2001
aug2020	400	0202	1500
aug2020	500	89	2001
sep2020	402	0101	3200
sep2020	500	89	2001
oct2020	403	0101	200
oct2020	500	89	2001
nov2020	400	0202	100
nov2020	500	89	2001

If you notice customer 400 has second ID 0101. But during the years the first ID or the second ID changes (not both at the same time). I want the table to become something like that

date	first_id	second_id	expenses
jul2020	400	0101	1000
jul2020	500	89	2001
aug2020	400	0101	1500
aug2020	500	89	2001
sep2020	400	0101	3200
sep2020	400	89	2001
oct2020	400	0101	200
oct2020	500	89	2001
nov2020	400	0101	100
nov2020	500	89	2001

now the customer with first ID 400 and second 0101 keeps it's original values thought the year.

After that I want to aggregate the expenses it by first ID ( I can do that) I just can't change the first ID or the second ID

PaigeMiller · Posted 09-30-2022 07:58 AM

So the step is to change 0202 under SECOND_ID to 0101? That's the only change? Or is it more general than that where sometimes 0404 change to 0101 as well? Or 3333 changes to 0237?

--
Paige Miller

neo_noone · Posted 09-30-2022 08:12 AM

YESSS and if the first ID is different than the original ( if it is, then the second ID will be the same as the original ID if you notice) change it to the original first ID

SAS Programming

how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Re: how to sum a column per variable that changes

Calculate cumulative sum by column

Creating a variable that sums the number of combinations of other vari...

summing one column when variables are duplicates in other columns

SUM of column

SAS Visual Analytics: Sum Missing Values

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...