SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
neo_noone
Calcite | Level 5

I am quite a novice in programming and I kind need your help regarding an issue I notice.

I have a table:

date  ID  secondary  D  E
date1 b1   c1          d1 e1
date1 b    c           d  e
date3 b2   c1          d3 e3
date4 b2   c2          d4 e4

Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful

 

17 REPLIES 17
Shmuel
Garnet | Level 18

The statemnt "BY ID secondary" is unique if the combination of the by variables is unique.

In your example: 'b1 c1',  'b2 c1',  'b2 c2' - these are all unique values.

 

In order to sum D column it should be numeric otherwise it has no sense;

 

 

PeterClemmensen
Tourmaline | Level 20

Here's how I interpret your question: In the first row, you want Id to equal Secondary. So you change either Id to 'c1' or Secondary to 'b1'. Then you do that for all obs, so that if ID = Secondary = 'b1' for example, then no other obs can have ID = 'b1' or Secondary = 'B1', correct?

 

I don't understand the summation of the D Columns though. Help me here?

neo_noone
Calcite | Level 5
YESSS 🙂 ... correct thank you. And then I want to sum column d for each (unique) ID . Thank you PeterC
PeterClemmensen
Tourmaline | Level 20

But D is a character column?

neo_noone
Calcite | Level 5

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the frist date and the same secondary ID as the first date ( or any day just to be cnsistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

 

date,             ID,        secondary ID,     expenses
jul2020           258           0004               1000
jul2020           xxx           xxxx                 xxx
......              ....               ....                 .....
......              ....               ....                   .....
aug2020         258             0008               2000
aug2020         xxx             xxxx                xxx
aug2020         500              0004             1000
Shmuel
Garnet | Level 18

If I understand you correctly you want to summarize the expences per user=ID.

That can be done either by proc summary (or peoc means, same syntax):

proc summary data=have noprint;
  calss ID;   /* or ID secondary ? */
  var expenses;
  output out=want sum=;
run;

or by sql

proc sql;
   select id, 
        sum(expenses)
   from have
   group by ID;   /* or id, secodary */
quit;

 

neo_noone
Calcite | Level 5

yeah but if I do that and don't take into account that the ID for the same customer is changing for some months wouldn't I loose some sums? 

Shmuel
Garnet | Level 18

To summarize per ID per month you need to add the date with appropriate format to the class or by statemnet.

Using proc summary it could be: by ID date; format date yymmn6.;

andreas_lds
Jade | Level 19

@neo_noone wrote:

I am quite a novice in programming and I kind need your help regarding an issue I notice.

I have a table:

date  ID  secondary  D  E
date1 b1   c1          d1 e1
date1 b    c           d  e
date3 b2   c1          d3 e3
date4 b2   c2          d4 e4

Id and secondary should be unique and always matching. But I notice that they are not. It's either correct the ID or the secondary ID. I want to sum all the D column per unique ID. Thanks you all for reading and if you have any inputs would be very helpful

 


The table has zero numeric variables (except for date, maybe), so there seems to be nothing that can be summed. Please post the data you have in usable form (a data step using datalines) and show the expected output for that dataset.

neo_noone
Calcite | Level 5

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

 

date,             ID,        secondary ID,     expenses
jul2020           258          0004               1000
jul2020           xxx           xxx               xxx
…        ….               ….              …..
aug2020         258             0008               2000
aug2020         xxx             xxx               xxx
aug2020         500             0004             1000

 

PaigeMiller
Diamond | Level 26

@neo_noone wrote:

no everything is numeric even ID. It's like this. As you can see we have different dates (but for the same date multiple customers). I notice that customer 258 for secondary ID 0004 during the years the ID or the secondary ID changes. And I wan to assign the same ID as the first date and the same secondary ID as the first date ( or any day just to be consistent). I want to to do this cause I want to know how many expenses each customer has during the years. There are like 50m obs.

 

date,             ID,        secondary ID,     expenses
jul2020           258          0004               1000
jul2020           xxx           xxx               xxx
…        ….               ….              …..
aug2020         258             0008               2000
aug2020         xxx             xxx               xxx
aug2020         500             0004             1000

 


Please fix this table, so that there are no xxx, there are actual numbers in the columns (even if you have to make up some numbers), and then show us the resulting table you want. Actually do the math for this small table, sum the columns and show us the desired output.

--
Paige Miller
neo_noone
Calcite | Level 5

thank you for trying 

lets say this is the original table

datefirst_idsecond_idexpenses 
jul202040001011000
jul2020500892001
aug202040002021500
aug2020500892001
sep202040201013200
sep2020500892001
oct20204030101200
oct2020500892001
nov20204000202100
nov2020500892001

If you notice customer 400 has second ID 0101. But during the years the first ID or the second ID changes (not both at the same time). I want the table to become something like that

datefirst_idsecond_idexpenses 
jul202040001011000
jul2020500892001
aug202040001011500
aug2020500892001
sep202040001013200
sep2020400892001
oct20204000101200
oct2020500892001
nov20204000101100
nov2020500892001

now the customer with first ID 400 and second 0101 keeps it's original values thought the year.

After that I want to aggregate the expenses it by first ID ( I can do that) I just can't change the first ID or the second ID

PaigeMiller
Diamond | Level 26

So the step is to change 0202 under SECOND_ID to 0101? That's the only change? Or is it more general than that where sometimes 0404 change to 0101 as well? Or 3333 changes to 0237?

--
Paige Miller
neo_noone
Calcite | Level 5

YESSS and if the first ID is different than the original ( if it is, then the second ID will be the same as the original ID if you notice) change it to the original first ID

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 3966 views
  • 0 likes
  • 5 in conversation