BookmarkSubscribeRSS Feed
Datatician
Calcite | Level 5

Hi everybody,

 

I'm working with this data - https://www.kaggle.com/somesnm/partynyc - which are essentially complaints to the police department of NYC about ongoing parties within the city in 2016. My goal is compress these complaints into actual parties (i.e., some parties were complained about more than once, so I need to separate different complaints for different locations). 

 

Attached is the snapshot of data sample.

 

In exact terms, I need to:

 

1. Count Number of Complaints (id_Complaint) if: 

            - 2 last columns match from 1 row to another (i.e., if 2 rows have the same location, they need to be compressed into 1 row, and have count of "2")

            - the filter to only count parties that have CreatedDate and ClosedDate within 12 hours of each other. This is to ensure that if complaints about the same location came during different days, I would be able to count it as different parties. Once again, requires comparisons of data in different rows in 2 columns.

 

I'm not accustomed to coding in SAS, so if you could provide instructions on how to achieve thru Query Builder, it would be greatly appreciated! 

 

Thank you! 

Thank you! 

3 REPLIES 3
TomKari
Onyx | Level 15

I don't tend to open Office attachments from the Internet, but here are a few general suggestions:

 

Generally, to compare fields between records, you need to sort on those fields. So for your grouping on complaint ID, you'd want to sort on location, and then use a data step with a "by" statement and "first." and "last." options.

 

The same for your date filter, but I don't find it completely clear what your algorithm is.

 

Tom

Datatician
Calcite | Level 5

You do know that if you click on it, it doesn't download the file, right? It just shows you the preview...

 

Regardless, data looks like this:

 

Complaint_ID | CreatedDate | ClosedDate | Location Type | Incident Zip | City | Borough | Latitude | Longitude

1                      31dec15:00     31DEC15:01  Store/Commer 10034           Bronx Bronx       40.86618  -73.918930

and so on.

 

I need to compare records across rows, and create A row for each series of complaints that:

- Happened within 12 hours (ClosedDate-CreatedDAte <= 12 hours)

- Have the same location (last 2 columns are the same in different rows)

- Count how many complaints came for that location within that 12 hour period

 

If different rows have the same location but didn't happen within one 12 hours window, it needs to create a separate row with same parameters. 

Patrick
Opal | Level 21

@Datatician 

A lot of forum members tend to answer questions by providing actual code which is very often fully working. To do so we need some representative sample data. Your chances to get such a code based answer will increase greatly if you provide sample data in the form of a working SAS data step creating such data.

We don't need a lot of data but make sure that your data is contains all the cases you want covered (so eventually amend your data to create such cases).

The other thing that's appreciated: Don't just state the problem but also provide some info what you've tried already and eventually post some of your not yet fully working code. This helps us a lot to understand where you're coming from and to provide an answer suitable to your current SAS skills level.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1385 views
  • 0 likes
  • 3 in conversation