Solved: Re: Add the observations that are same

Malathi13 · Posted 02-11-2020 03:31 PM

Hi All,

I have a question on adding observations that are the same: My dataset (see below) has date_time, Name1, name2, name3 and a vehicle number. I want to add all the observations from the same vehicle number on the same date for that person.

Date_Time Vehicle    Name1 Name2 Name3
2019-09-13 20:18    1293    John Doe Jane Doe
2019-09-13 20:47    1293    John Doe Jane Doe
2019-09-13 22:18    1293    Jane Doe John Doe
2019-09-23 00:50 1292 John Doe John Smith
2019-09-24 00:02 1293 John Doe Jane Smith    Alex Doe
2019-09-24 20:16 1293 John Doe Ben smith
2019-09-25 02:52 1293 John Doe Ben smith

From the table above, the bold ones are on the same date, same vehicle and same names. I want to count them as 1 incident on that date for that person with that vehicle number.

Can someone help me with that! Not sure how to write a program for that.

Thank you,

M

ballardw · Posted 02-11-2020 06:16 PM

@Malathi13 wrote:

Hi,

I want to add a counter and I don't have any other variables in the dataset. I want to have a report, dataset is also fine. And the date_time is a character variable here, so it can be anything, doesn't matter.

M

Actually the date_time value does matter because you specifically asked to group the data by DATE. Which means that date groups have to be created from the datetime value.

Here's my first stab at this:

data have;
   infile datalines dlm=',' truncover;
   input Date_Time :$16.  Vehicle  $   (Name1 Name2  Name3) (:$15.);
datalines;
2019-09-13 20:18,1293,John Doe,Jane Doe, 
2019-09-13 20:47,1293,John Doe,Jane Doe, 
2019-09-13 22:18,1293,Jane Doe,John Doe,
2019-09-23 00:50,1292,John Doe,John Smith,
2019-09-24 00:02,1293,John Doe,Jane Smith,Alex Doe
2019-09-24 20:16,1293,John Doe,Ben smith,
2019-09-25 02:52,1293,John Doe,Ben smith,
;

proc tabulate data=have;
   class date_time vehicle name1 name2 name3 /missing;
   format date_time $10.;
   tables date_time*vehicle*(name1 name2 name3),
          n
          /misstext=' '
   ;
run;

The data step is to have something that code can be tested with to see if we are getting close to what you want. Note that lots of guesses are made as you haven't shown what the desired result would be. The /missing option is needed with tabulate as otherwise any record with any missing name would be excluded. That means you get "blank" names in the output.

If you want to count any occurrence of a name value regardless of whether it started in Name1, Name2 or Name3 then the first step is going to be transposing the data so that all the name values occur in a single variable:

Proc transpose data=have out=trans;
   by date_time vehicle;
   var name1 name2 name3;
run;

proc tabulate data=trans;
   class date_time vehicle col1;
   format date_time $10.;
   tables date_time*vehicle*col1='Name',
          n
          /misstext=' '
   ;
run;

View solution in original post

ballardw · Posted 02-11-2020 03:47 PM

Do you want to add a counter or just select the unique values?

What value for the Date_time should be in the output?

Are there any other variables involved that you didn't show?

Do you want a data set (assuming so but checking) for further manipulation or a report that people read?

Malathi13 · Posted 02-11-2020 04:13 PM

Hi,

I want to add a counter and I don't have any other variables in the dataset. I want to have a report, dataset is also fine. And the date_time is a character variable here, so it can be anything, doesn't matter.

M

ballardw · Posted 02-11-2020 06:16 PM

@Malathi13 wrote:

Hi,

I want to add a counter and I don't have any other variables in the dataset. I want to have a report, dataset is also fine. And the date_time is a character variable here, so it can be anything, doesn't matter.

M

Actually the date_time value does matter because you specifically asked to group the data by DATE. Which means that date groups have to be created from the datetime value.

Here's my first stab at this:

data have;
   infile datalines dlm=',' truncover;
   input Date_Time :$16.  Vehicle  $   (Name1 Name2  Name3) (:$15.);
datalines;
2019-09-13 20:18,1293,John Doe,Jane Doe, 
2019-09-13 20:47,1293,John Doe,Jane Doe, 
2019-09-13 22:18,1293,Jane Doe,John Doe,
2019-09-23 00:50,1292,John Doe,John Smith,
2019-09-24 00:02,1293,John Doe,Jane Smith,Alex Doe
2019-09-24 20:16,1293,John Doe,Ben smith,
2019-09-25 02:52,1293,John Doe,Ben smith,
;

proc tabulate data=have;
   class date_time vehicle name1 name2 name3 /missing;
   format date_time $10.;
   tables date_time*vehicle*(name1 name2 name3),
          n
          /misstext=' '
   ;
run;

The data step is to have something that code can be tested with to see if we are getting close to what you want. Note that lots of guesses are made as you haven't shown what the desired result would be. The /missing option is needed with tabulate as otherwise any record with any missing name would be excluded. That means you get "blank" names in the output.

If you want to count any occurrence of a name value regardless of whether it started in Name1, Name2 or Name3 then the first step is going to be transposing the data so that all the name values occur in a single variable:

Proc transpose data=have out=trans;
   by date_time vehicle;
   var name1 name2 name3;
run;

proc tabulate data=trans;
   class date_time vehicle col1;
   format date_time $10.;
   tables date_time*vehicle*col1='Name',
          n
          /misstext=' '
   ;
run;

Malathi13 · Posted 02-13-2020 12:57 PM

Thank you so much @ ballardw. I appreciate your help so much! My output I want to have is:

a count of how many John Doe are there from Name1, Name2 or Name3 columns per vehicle in a day without duplicating. If Name1 is John Doe and Name2 is Jane Doe on this date 2019-09-13 20:1 and for this vehicle number 1293. I don't want to count if Jane Doe in Name1 and John Doe in

Name2 on that same date and same vehicle.

So the final output should be

2019-09-13, 1293, John Doe, Jane Doe

2019-09-13, 1293, John Doe, Alex Doe

2019-09-13, 1292, Jane Doe, John Doe

Either the name or the date or the vehicle number should be different. I want to count all those.

Thank you,

M

Malathi13 · Posted 06-30-2020 10:45 AM

Thank you so much but that didn't work and sorry for late reply!!

Any help on this is greatly appreciated!

M

mkeintz · Posted 06-30-2020 03:56 PM

"that didn't work" is not a way to get well-formed solutions. What failed? What did you expect vs. what did you get? What clues are offered in the log? Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Classroom Training Available!