Desktop productivity for business analysts and programmers

Trying to compress data based on variables in different rows

Reply
New Contributor
Posts: 2

Trying to compress data based on variables in different rows

Hi everybody,

 

I'm working with this data - https://www.kaggle.com/somesnm/partynyc - which are essentially complaints to the police department of NYC about ongoing parties within the city in 2016. My goal is compress these complaints into actual parties (i.e., some parties were complained about more than once, so I need to separate different complaints for different locations). 

 

Attached is the snapshot of data sample.

 

In exact terms, I need to:

 

1. Count Number of Complaints (id_Complaint) if: 

            - 2 last columns match from 1 row to another (i.e., if 2 rows have the same location, they need to be compressed into 1 row, and have count of "2")

            - the filter to only count parties that have CreatedDate and ClosedDate within 12 hours of each other. This is to ensure that if complaints about the same location came during different days, I would be able to count it as different parties. Once again, requires comparisons of data in different rows in 2 columns.

 

I'm not accustomed to coding in SAS, so if you could provide instructions on how to achieve thru Query Builder, it would be greatly appreciated! 

 

Thank you! 

Thank you! 

PROC Star
Posts: 1,262

Re: Trying to compress data based on variables in different rows

Posted in reply to Datatician

I don't tend to open Office attachments from the Internet, but here are a few general suggestions:

 

Generally, to compare fields between records, you need to sort on those fields. So for your grouping on complaint ID, you'd want to sort on location, and then use a data step with a "by" statement and "first." and "last." options.

 

The same for your date filter, but I don't find it completely clear what your algorithm is.

 

Tom

New Contributor
Posts: 2

Re: Trying to compress data based on variables in different rows

You do know that if you click on it, it doesn't download the file, right? It just shows you the preview...

 

Regardless, data looks like this:

 

Complaint_ID | CreatedDate | ClosedDate | Location Type | Incident Zip | City | Borough | Latitude | Longitude

1                      31dec15:00     31DEC15:01  Store/Commer 10034           Bronx Bronx       40.86618  -73.918930

and so on.

 

I need to compare records across rows, and create A row for each series of complaints that:

- Happened within 12 hours (ClosedDate-CreatedDAte <= 12 hours)

- Have the same location (last 2 columns are the same in different rows)

- Count how many complaints came for that location within that 12 hour period

 

If different rows have the same location but didn't happen within one 12 hours window, it needs to create a separate row with same parameters. 

Respected Advisor
Posts: 4,541

Re: Trying to compress data based on variables in different rows

Posted in reply to Datatician

@Datatician 

A lot of forum members tend to answer questions by providing actual code which is very often fully working. To do so we need some representative sample data. Your chances to get such a code based answer will increase greatly if you provide sample data in the form of a working SAS data step creating such data.

We don't need a lot of data but make sure that your data is contains all the cases you want covered (so eventually amend your data to create such cases).

The other thing that's appreciated: Don't just state the problem but also provide some info what you've tried already and eventually post some of your not yet fully working code. This helps us a lot to understand where you're coming from and to provide an answer suitable to your current SAS skills level.

Ask a Question
Discussion stats
  • 3 replies
  • 147 views
  • 0 likes
  • 3 in conversation