So the CSV export from that site messes up the datetime fields.
But if you use the API interface it generates the datetime fields in ISO standard format.
So copy the file yourself to a CSV file and it should be easier to read it.
data _null_;
infile "https://data.cityofnewyork.us/resource/76xm-jjuj.csv?$limit=30000000" url ;
file 'nycems.csv';
input;
put _infile_;
run;
PROC IMPORT is a guessing procedure. In this case, it guessed wrong. However, the actual code it uses to import the CSV file is provided in the log, you can copy this code, modify it to produce the correct dates, and run it again.
@JKHess wrote:
Yes, I did modify the code in the log file when I changed the informat from anydtdtm40. to anydtdte40. Not sure what else to try.
It's not reading all the dates wrong (or differently from MS), just some of them.
These informats are not the ones you want to use.
You will need to identify the specific informats that will allow you to read the dates properly. It may be that you want to switch from the MMDDYY informat to the DDMMYY informat, or vice versa. Providing examples of the dates in the CSV file would be very helpful.
As @PaigeMiller said, PROC IMPORT is guessing as to the meaning of values it reads from the CSV. If you use Excel to read a CSV, it also will guess, and I suspect PowerBI will guess as well. When you use the anydt* informats, you are telling SAS to guess the meaning, because you don't know if the date could be mm/dd/yyyy or yyyy/mm/dd or one of many other possible formats.
If PROC IMPORT, Excel, and PowerBI are making different guesses as to the meaning of a date value stored in a CSV, there is no good way to know which guess is correct. You would need to ask the source of the CSV about the meaning of each value.
Hopefully the source will tell you that there is a consistent date format used for a variable. Then in SAS, you would specify the appropriate specific informat when you read in the data. The goal is to avoid guessing, which can always go wrong.
@JKHess wrote:
It's this file: https://data.cityofnewyork.us/Public-Safety/EMS-Incident-Dispatch-Data/76xm-jjuj
It's too big for me to open it with a text editor.
Then you use the wrong editor. Something like Notepad++ should not be overwhelmed by size.
Alternatively, this SAS method can also help:
data _null_;
infile "C:\filename.csv" lrecl=32767 obs=10;
input;
list;
run;
which shows the contents in the log, or this one:
data test;
infile "C:\filename.csv" lrecl=32767 obs=10;
length line $32767;
input;
line = _infile_;
run;
which puts the lines into a dataset.
If the example shown is correct then the data appears in YYYY MON DD HH:MM:SS am/pm structure. Which is nearly one of the stupidest data storage formats I have seen in a long time. (MON means three letter month abbreviation)
This is one way to parse the values. It would require reading all the values as strings, then using an array to loop over them to process and assign the values to another array for the datetime value.
data junk; string = '2023 Oct 01 12:15:23 AM'; dt = dhms(input(cats(scan(string,3),scan(string,2),scan(string,1)),date9.),0,0,input(cats(scan(string,4),scan(string,4)),time10.)); format dt datetime20.; run;
So the CSV export from that site messes up the datetime fields.
But if you use the API interface it generates the datetime fields in ISO standard format.
So copy the file yourself to a CSV file and it should be easier to read it.
data _null_;
infile "https://data.cityofnewyork.us/resource/76xm-jjuj.csv?$limit=30000000" url ;
file 'nycems.csv';
input;
put _infile_;
run;
Once you have a valid CSV file you can use %CSV2DS() to help you GUESS how to read it.
You will need to use the PRECENT= option to sample a subset of the rows since over 25 million observations will overwhelm PROC SQL's ability to summarize the data types.
%csv2ds('c:\downloads\nycems.csv',out=nycems,percent=1,replace=1)
Resulting SAS code:
data nycems; infile 'c:\downloads\nycems.csv' dlm=',' dsd truncover firstobs=2 ; length cad_incident_id 8 incident_datetime 8 initial_call_type $6 initial_severity_level_code 8 final_call_type $6 final_severity_level_code 8 first_assignment_datetime 8 valid_dispatch_rspns_time_indc $5 dispatch_response_seconds_qy 8 first_activation_datetime 8 first_on_scene_datetime 8 valid_incident_rspns_time_indc $5 incident_response_seconds_qy 8 incident_travel_tm_seconds_qy 8 first_to_hosp_datetime 8 first_hosp_arrival_datetime 8 incident_close_datetime 8 held_indicator $5 incident_disposition_code 8 borough $24 incident_dispatch_area $2 zipcode 8 policeprecinct 8 citycouncildistrict 8 communitydistrict 8 communityschooldistrict 8 congressionaldistrict 8 reopen_indicator $5 special_event_indicator $5 standby_indicator $5 transfer_indicator $5 ; informat incident_datetime anydtdtm. first_assignment_datetime anydtdtm. first_activation_datetime anydtdtm. first_on_scene_datetime anydtdtm. first_to_hosp_datetime anydtdtm. first_hosp_arrival_datetime anydtdtm. incident_close_datetime anydtdtm. zipcode comma. ; format incident_datetime datetime19. first_assignment_datetime datetime19. first_activation_datetime datetime19. first_on_scene_datetime datetime19. first_to_hosp_datetime datetime19. first_hosp_arrival_datetime datetime19. incident_close_datetime datetime19. zipcode comma6. ; input cad_incident_id -- transfer_indicator ; run;
You should probably remove the FORMAT from ZIPCODE since you don't want to replicate the mistake in the data and print the values with thousands separators.
You also will probably want to standardize the "indicator" variables to consistently use Y/N instead of sometimes using true/false.
Example:
proc format ;
invalue $yntf 'Y','true'='Y' 'N','false'='N';
run;
data nycems;
infile "c:\downloads\nycems.csv" dlm=',' dsd truncover firstobs=2 ;
length cad_incident_id 8 incident_datetime 8 initial_call_type $6
initial_severity_level_code 8 final_call_type $6
final_severity_level_code 8 first_assignment_datetime 8
valid_dispatch_rspns_time_indc $1 dispatch_response_seconds_qy 8
first_activation_datetime 8 first_on_scene_datetime 8
valid_incident_rspns_time_indc $1 incident_response_seconds_qy 8
incident_travel_tm_seconds_qy 8 first_to_hosp_datetime 8
first_hosp_arrival_datetime 8 incident_close_datetime 8
held_indicator $1 incident_disposition_code 8 borough $24
incident_dispatch_area $2 zipcode 8 policeprecinct 8
citycouncildistrict 8 communitydistrict 8 communityschooldistrict 8
congressionaldistrict 8 reopen_indicator $1 special_event_indicator $1
standby_indicator $1 transfer_indicator $1
;
informat incident_datetime first_assignment_datetime
first_activation_datetime first_on_scene_datetime
first_to_hosp_datetime first_hosp_arrival_datetime
incident_close_datetime anydtdtm.
zipcode comma.
valid_dispatch_rspns_time_indc valid_incident_rspns_time_indc
held_indicator reopen_indicator special_event_indicator standby_indicator
transfer_indicator $yntf.
;
format incident_datetime
first_assignment_datetime
first_activation_datetime
first_on_scene_datetime first_to_hosp_datetime
first_hosp_arrival_datetime
incident_close_datetime datetime19.
;
input cad_incident_id -- transfer_indicator ;
run;
Modifying @Tom 's code very slightly to read from the URL directly.
filename src url 'https://data.cityofnewyork.us/resource/76xm-jjuj.csv';
proc format ;
invalue $yntf 'Y','true'='Y' 'N','false'='N';
run;
data nycems;
infile src dlm=',' dsd truncover firstobs=2 ;
length cad_incident_id 8 incident_datetime 8 initial_call_type $6
initial_severity_level_code 8 final_call_type $6
final_severity_level_code 8 first_assignment_datetime 8
valid_dispatch_rspns_time_indc $1 dispatch_response_seconds_qy 8
first_activation_datetime 8 first_on_scene_datetime 8
valid_incident_rspns_time_indc $1 incident_response_seconds_qy 8
incident_travel_tm_seconds_qy 8 first_to_hosp_datetime 8
first_hosp_arrival_datetime 8 incident_close_datetime 8
held_indicator $1 incident_disposition_code 8 borough $24
incident_dispatch_area $2 zipcode 8 policeprecinct 8
citycouncildistrict 8 communitydistrict 8 communityschooldistrict 8
congressionaldistrict 8 reopen_indicator $1 special_event_indicator $1
standby_indicator $1 transfer_indicator $1
;
informat incident_datetime first_assignment_datetime
first_activation_datetime first_on_scene_datetime
first_to_hosp_datetime first_hosp_arrival_datetime
incident_close_datetime anydtdtm.
zipcode comma.
valid_dispatch_rspns_time_indc valid_incident_rspns_time_indc
held_indicator reopen_indicator special_event_indicator standby_indicator
transfer_indicator $yntf.
;
format incident_datetime
first_assignment_datetime
first_activation_datetime
first_on_scene_datetime first_to_hosp_datetime
first_hosp_arrival_datetime
incident_close_datetime datetime19.
;
input cad_incident_id -- transfer_indicator ;
run;
Modifying @Reeza and @Tom suggest code just a bit:
Custom informat:
proc format ; invalue yntf (upcase) 'Y','YES','T','TRUE'=1 'N','NO','F','FALSE'=0 ; run;
And use Yntf (not character $Yntf ) informat. The counting of yes/no/true/false values quite often goes much easier using numeric as then Proc means/summary or the report procedures Report and Tabulate can get counts of Yes (sum statistic) percent as decimal value (mean statistic) and it is easier if you need to get something like that from multiple variables on a record.
Note that I added a couple of guessed other values that might appear just in case in the informat YES, NO, T, F and the upcase option will deal with possible changes in case down the line.
@Tom Is there additional code needed to run this macro? I'm getting a warning and an error. I think I need to identify where the macro code is found?
%csv2ds("c:\Users\judyk\nycems.csv",out=nycems,percent=1,replace=1);
-
180 WARNING: Apparent invocation of macro CSV2DS not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.