BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
svzplayer
Fluorite | Level 6

Hello everyone,

I am trying to do some regression on datafile from here: https://www.kaggle.com/tsarkov90/crime-in-russia-20032020

However, I encounter a problem. I have tried to import the file using:

proc import datafile="<path>"
out=crime
dbms=csv
replace;
getnames=yes;
run;

The problem is, that it seems to swype month and day. I mean there are first 12 days of January and then next year. How can I fix this problem?

Also, for further analysis I'd like to retain only month/year. How can I do this?

 

Many thanks, cheers 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@svzplayer wrote:

Alright, I have followed your suggestion writing:

data crime;
infile "<path>" DLM=',' FIRSTOBS=2;
input month $ Total_crimes Serious Huge_damage Ecological Terrorism
Extremism Murder Harm_to_health Rape Theft Vehicle_theft
Fraud_scam Hooligan Drugs Weapons;
format month DDMMYY10.;
run;

And... Another problem.

I attach two photos - first results with proc import and 2nd with this data step procedure. Seems like all years are cut on the first 2 digits.

 

<sorry for such a long reply, however I could not upload either .png or .jpg>


You did not tell SAS to read the value as a date and a simple $ input such as : input month $ only reads 8 characters.

Try this data step:

data crime;
   infile "<path>" DLM=',' FIRSTOBS=2;
   informat month ddmmyy10.
   input month  Total_crimes Serious Huge_damage Ecological Terrorism
            Extremism Murder Harm_to_health Rape Theft Vehicle_theft
            Fraud_scam Hooligan Drugs Weapons;
   format month DDMMYY10.;
run;

And as I mention in my other post, you may want to use the YYMMN6. format .

 

View solution in original post

6 REPLIES 6
Reeza
Super User
Don't use PROC IMPORT then, you'll need to write a data step to import the data. You can get the code from the log and change the variables needed and then re-run it.

Another option is to try adding the guessingrows=max; option to it to see if it reads it correctly.
svzplayer
Fluorite | Level 6

Alright, I have followed your suggestion writing:

data crime;
infile "<path>" DLM=',' FIRSTOBS=2;
input month $ Total_crimes Serious Huge_damage Ecological Terrorism
Extremism Murder Harm_to_health Rape Theft Vehicle_theft
Fraud_scam Hooligan Drugs Weapons;
format month DDMMYY10.;
run;

And... Another problem.

I attach two photos - first results with proc import and 2nd with this data step procedure. Seems like all years are cut on the first 2 digits.

 

<sorry for such a long reply, however I could not upload either .png or .jpg>

ballardw
Super User

@svzplayer wrote:

Alright, I have followed your suggestion writing:

data crime;
infile "<path>" DLM=',' FIRSTOBS=2;
input month $ Total_crimes Serious Huge_damage Ecological Terrorism
Extremism Murder Harm_to_health Rape Theft Vehicle_theft
Fraud_scam Hooligan Drugs Weapons;
format month DDMMYY10.;
run;

And... Another problem.

I attach two photos - first results with proc import and 2nd with this data step procedure. Seems like all years are cut on the first 2 digits.

 

<sorry for such a long reply, however I could not upload either .png or .jpg>


You did not tell SAS to read the value as a date and a simple $ input such as : input month $ only reads 8 characters.

Try this data step:

data crime;
   infile "<path>" DLM=',' FIRSTOBS=2;
   informat month ddmmyy10.
   input month  Total_crimes Serious Huge_damage Ecological Terrorism
            Extremism Murder Harm_to_health Rape Theft Vehicle_theft
            Fraud_scam Hooligan Drugs Weapons;
   format month DDMMYY10.;
run;

And as I mention in my other post, you may want to use the YYMMN6. format .

 

svzplayer
Fluorite | Level 6

Thank you very much!

ballardw
Super User

The default behavior of reading dates of xx/yy/zz and whether XX is treated as month or day of month and YY the other is based on your current setting of your DATESTYLE option.

 

You can check what your current setting is with

proc options option=datestyle;
run;

The log will show something like

 DATESTYLE=MDY     Specifies the sequence of month, day, and year when ANYDTDTE, ANYDTDTM, or
                   ANYDTTME informat data is ambiguous.

Or you might see   DMY for the order.

 

If the order in the data is different than your setting then Import would swap the order of day and month from what is intended.

You can fix this by a couple of methods.

Set the option to the desired order with an Options datestyle= MDY(or DMY), which ever is needed.

Don't forget to set it the Datestyle option back to your current afterwards or other things may misbehave.

Or Proc import would have created data step code in the log. Copy the code and clean it up removing line numbers and such. Then

find the INFORMAT statement for the variable(s) of interest and change them to read the data properly. I can't see the data a the link you provided so I would guess that an informat of either MMDDYY10. or DDMMYY10. (depending on whether month or day comes first) might work.

 

There is not need to change the date values once created to do analysis by year and month only. You can assign a format to a variable that will create groups usable by any of the SAS analysis procedures. Likely candidates would by YYMMN6. to create groups like 201906 (June 2019) or YYMON7. if you want something like 2019JUN displayed.

svzplayer
Fluorite | Level 6

Sorry if the link is inaccessible 😞 

Yes, date is in format dd/mm/yyyy so I guess it's DDMMYY10. . Here is a funny thing, I followed your instructions and log shows that I have set dmy. However, when I run proc contents, it shows MMDDYY10. in the particular cell. In a reply for a previous answer, I have attached results from proc import and data step. I hope it can help somehow, because I run out of ideas...

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2195 views
  • 3 likes
  • 3 in conversation