Hi,
I have a dataset that it something like this (can't post the original one because of confidential) but I'll do my best to explain it. The dataset is not grouped och order by nothing, I just have ordered it here because it would be easier for you to read)
The problem is that I have a dataset who looks like this:
BOOKING(char) AMOUNT(num) DAYS(num) 1.2.3 Andersen 10 30 1.2.4 Andersen 12 28 1.2.4 Baker whatever whatever 1.2.5 Brown whatever whatever 2.1.1 Clark 8 20 2.1.2 Clark 6 18
You see that we have duplicates of Andersen and Clark. Because we have four different families. How do I combine them? (recall that they're CHAR).
I have tried:
1. select distinct * from dataset GROUP BY 1 .
- then I get duplicates ....
2. sum(BOOKING)
- not working beacuse it is a char
3. proc sort data=dataset out=dataset2 nodupkey; by BOOKING; run;
- still sees the duplicates and I guess it is because is is different because of the (1.2.3) and (1.2.4) thing that comes before the last name
4. sum(input(BOOKING, best12.))as booking
- I get :
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.
Not sure that I fully understood what you're asking for. Is below returning what you want?
data have;
input BOOKING $14. @15 AMOUNT DAYS;
datalines;
1.2.3 Andersen 10 30
1.2.4 Andersen 12 28
1.2.4 Baker 1 1
1.2.5 Brown 1 1
2.1.1 Clark 8 20
2.1.2 Clark 6 18
;
proc sql;
select
substr(booking,find(booking,' ')) as booking_name
,sum(amount) as sum_amount
,sum(days) as sum_days
from have
group by booking_name
;
quit;
I don't know how to edit my topic. But I want to summerize Andersen and Clark into one
What do you want the single rows for Andersen and Clark to contain? Is there a rule to decide which of the two rows in these examples is chosen?
They are all the same the column is "1.2.3 Andersen" and then the other family is "1.2.4 Andersen" and I want to summerize families that have the name Andersen (and Clark in this example as well)
Not sure that I fully understood what you're asking for. Is below returning what you want?
data have;
input BOOKING $14. @15 AMOUNT DAYS;
datalines;
1.2.3 Andersen 10 30
1.2.4 Andersen 12 28
1.2.4 Baker 1 1
1.2.5 Brown 1 1
2.1.1 Clark 8 20
2.1.2 Clark 6 18
;
proc sql;
select
substr(booking,find(booking,' ')) as booking_name
,sum(amount) as sum_amount
,sum(days) as sum_days
from have
group by booking_name
;
quit;
Thank you so much !
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.