BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Malathi13
Obsidian | Level 7

Hi All,

I have a question on adding observations that are the same: My dataset (see below) has date_time, Name1, name2, name3 and a vehicle number. I want to add all the observations from the same vehicle number on the same date for that person.

 

Date_Time                         Vehicle            Name1                 Name2             Name3
2019-09-13 20:18        1293          John Doe       Jane Doe
2019-09-13 20:47        1293          John Doe       Jane Doe
2019-09-13 22:18        1293          Jane Doe       John Doe
2019-09-23 00:50               1292             John Doe           John Smith
2019-09-24 00:02               1293             John Doe           Jane Smith          Alex Doe
2019-09-24 20:16               1293             John Doe           Ben smith
2019-09-25 02:52               1293             John Doe           Ben smith

 

From the table above, the bold ones are on the same date, same vehicle and same names. I want to count them as 1 incident on that date for that person with that vehicle number.

 

Can someone help me with that! Not sure how to write a program for that.

 

Thank you,

M

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Malathi13 wrote:

Hi,

I want to add a counter and I don't have any other variables in the dataset. I want to have a report, dataset is also fine. And the date_time is a character variable here, so it can be anything, doesn't matter.

 

M


Actually the date_time value does matter because you specifically asked to group the data by DATE. Which means that date groups have to be created from the datetime value.

 

Here's my first stab at this:

data have;
   infile datalines dlm=',' truncover;
   input Date_Time :$16.  Vehicle  $   (Name1 Name2  Name3) (:$15.);
datalines;
2019-09-13 20:18,1293,John Doe,Jane Doe, 
2019-09-13 20:47,1293,John Doe,Jane Doe, 
2019-09-13 22:18,1293,Jane Doe,John Doe,
2019-09-23 00:50,1292,John Doe,John Smith,
2019-09-24 00:02,1293,John Doe,Jane Smith,Alex Doe
2019-09-24 20:16,1293,John Doe,Ben smith,
2019-09-25 02:52,1293,John Doe,Ben smith,
;

proc tabulate data=have;
   class date_time vehicle name1 name2 name3 /missing;
   format date_time $10.;
   tables date_time*vehicle*(name1 name2 name3),
          n
          /misstext=' '
   ;
run;

The data step is to have something that code can be tested with to see if we are getting close to what you want. Note that lots of guesses are made as you haven't shown what the desired result would be. The /missing option is needed with tabulate as otherwise any record with any missing name would be excluded. That means you get "blank" names in the output.

 

If you want to count any occurrence of a name value regardless of whether it started in Name1, Name2 or Name3 then the first step is going to be transposing the data so that all the name values occur in a single variable:

Proc transpose data=have out=trans;
   by date_time vehicle;
   var name1 name2 name3;
run;

proc tabulate data=trans;
   class date_time vehicle col1;
   format date_time $10.;
   tables date_time*vehicle*col1='Name',
          n
          /misstext=' '
   ;
run;

View solution in original post

6 REPLIES 6
ballardw
Super User

Do you want to add a counter or just select the unique values?

What value for the Date_time should be in the output?

Are there any other variables involved that you didn't show?

 

Do you want a data set (assuming so but checking) for further manipulation or a report that people read?

Malathi13
Obsidian | Level 7

Hi,

I want to add a counter and I don't have any other variables in the dataset. I want to have a report, dataset is also fine. And the date_time is a character variable here, so it can be anything, doesn't matter.

 

M

ballardw
Super User

@Malathi13 wrote:

Hi,

I want to add a counter and I don't have any other variables in the dataset. I want to have a report, dataset is also fine. And the date_time is a character variable here, so it can be anything, doesn't matter.

 

M


Actually the date_time value does matter because you specifically asked to group the data by DATE. Which means that date groups have to be created from the datetime value.

 

Here's my first stab at this:

data have;
   infile datalines dlm=',' truncover;
   input Date_Time :$16.  Vehicle  $   (Name1 Name2  Name3) (:$15.);
datalines;
2019-09-13 20:18,1293,John Doe,Jane Doe, 
2019-09-13 20:47,1293,John Doe,Jane Doe, 
2019-09-13 22:18,1293,Jane Doe,John Doe,
2019-09-23 00:50,1292,John Doe,John Smith,
2019-09-24 00:02,1293,John Doe,Jane Smith,Alex Doe
2019-09-24 20:16,1293,John Doe,Ben smith,
2019-09-25 02:52,1293,John Doe,Ben smith,
;

proc tabulate data=have;
   class date_time vehicle name1 name2 name3 /missing;
   format date_time $10.;
   tables date_time*vehicle*(name1 name2 name3),
          n
          /misstext=' '
   ;
run;

The data step is to have something that code can be tested with to see if we are getting close to what you want. Note that lots of guesses are made as you haven't shown what the desired result would be. The /missing option is needed with tabulate as otherwise any record with any missing name would be excluded. That means you get "blank" names in the output.

 

If you want to count any occurrence of a name value regardless of whether it started in Name1, Name2 or Name3 then the first step is going to be transposing the data so that all the name values occur in a single variable:

Proc transpose data=have out=trans;
   by date_time vehicle;
   var name1 name2 name3;
run;

proc tabulate data=trans;
   class date_time vehicle col1;
   format date_time $10.;
   tables date_time*vehicle*col1='Name',
          n
          /misstext=' '
   ;
run;
Malathi13
Obsidian | Level 7

Thank you so much @ ballardw. I appreciate your help so much! My output I want to have is:

 

a count of how many John Doe are there from Name1, Name2 or Name3 columns per vehicle in a day without duplicating. If Name1 is John Doe and Name2 is Jane Doe on this date 2019-09-13 20:1 and for this vehicle number 1293. I don't want to count if Jane Doe in Name1 and John Doe in

Name2 on that same date and same vehicle.

 

So the final output should be

 

2019-09-13, 1293, John Doe, Jane Doe

2019-09-13, 1293, John Doe, Alex Doe

2019-09-13, 1292, Jane Doe, John Doe

 

Either the name or the date or the vehicle number should be different. I want to count all those.

 

Thank you,

M

Malathi13
Obsidian | Level 7

Thank you so much but that didn't work and sorry for late reply!! 

 

Any help on this is greatly appreciated!

 

M

mkeintz
PROC Star

"that didn't work" is not a way to get well-formed solutions.  What failed?  What did you expect vs. what did you get?   What clues are offered in the log?   Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1459 views
  • 0 likes
  • 3 in conversation