Solved: Combine rows in my dataset that are duplicates

melhaf · Posted 02-06-2024 02:43 PM

Hi,
I have a dataset that it something like this (can't post the original one because of confidential) but I'll do my best to explain it. The dataset is not grouped och order by nothing, I just have ordered it here because it would be easier for you to read)

The problem is that I have a dataset who looks like this:

BOOKING(char)           AMOUNT(num)     DAYS(num)
1.2.3     Andersen      10              30 
1.2.4     Andersen      12              28 
1.2.4      Baker        whatever        whatever
1.2.5      Brown        whatever        whatever      
2.1.1      Clark         8               20
2.1.2      Clark          6               18

You see that we have duplicates of Andersen and Clark. Because we have four different families. How do I combine them? (recall that they're CHAR).

I have tried:

1. select distinct * from dataset GROUP BY 1 .
- then I get duplicates ....

2. sum(BOOKING)
- not working beacuse it is a char

3. proc sort data=dataset out=dataset2 nodupkey; by BOOKING; run;
- still sees the duplicates and I guess it is because is is different because of the (1.2.3) and (1.2.4) thing that comes before the last name

4. sum(input(BOOKING, best12.))as booking
- I get :
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.

Patrick · Posted 02-06-2024 04:54 PM

Not sure that I fully understood what you're asking for. Is below returning what you want?

data have;
 input BOOKING $14. @15 AMOUNT DAYS;
 datalines;
1.2.3 Andersen 10 30 
1.2.4 Andersen 12 28 
1.2.4 Baker    1 1
1.2.5 Brown    1 1 
2.1.1 Clark    8 20
2.1.2 Clark    6 18
;

proc sql;
  select 
     substr(booking,find(booking,' ')) as booking_name
    ,sum(amount) as sum_amount
    ,sum(days) as sum_days
  from have
  group by booking_name
  ;
quit;

View solution in original post

melhaf · Posted 02-06-2024 02:44 PM

I don't know how to edit my topic. But I want to summerize Andersen and Clark into one

Reeza · Posted 02-06-2024 03:13 PM

Please show what you would expect as output.

SASKiwi · Posted 02-06-2024 03:14 PM

What do you want the single rows for Andersen and Clark to contain? Is there a rule to decide which of the two rows in these examples is chosen?

melhaf · Posted 02-06-2024 03:22 PM

Hi, I want to summarize them, so Andersen have together 10+12 AMOUNT,
30+28 DAYS and the same for Clark

Reeza · Posted 02-06-2024 03:33 PM

What happens with booking though? Since you seem to be trying to sum them. Showing the example output would be helpful.

melhaf · Posted 02-06-2024 03:42 PM

It dosen't happends anything with the code I posted. Or, I posted what the log said.

Reeza · Posted 02-06-2024 03:34 PM

And is there two different variables, the booking and the name or are they one variable? If so, you very much want to separate those first.

melhaf · Posted 02-06-2024 03:51 PM

They are all the same the column is "1.2.3 Andersen" and then the other family is "1.2.4 Andersen" and I want to summerize families that have the name Andersen (and Clark in this example as well)

Patrick · Posted 02-06-2024 04:54 PM

Not sure that I fully understood what you're asking for. Is below returning what you want?

data have;
 input BOOKING $14. @15 AMOUNT DAYS;
 datalines;
1.2.3 Andersen 10 30 
1.2.4 Andersen 12 28 
1.2.4 Baker    1 1
1.2.5 Brown    1 1 
2.1.1 Clark    8 20
2.1.2 Clark    6 18
;

proc sql;
  select 
     substr(booking,find(booking,' ')) as booking_name
    ,sum(amount) as sum_amount
    ,sum(days) as sum_days
  from have
  group by booking_name
  ;
quit;

melhaf · Posted 02-08-2024 02:13 AM

Thank you so much !

Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Re: Combine rows in my dataset that are duplicates

Registration is open

SAS Training: Just a Click Away