Ok, there we go. And, ok, even if it 'seemed' to work (didn't error), that doesn't mean it's right. So, lets get it right! If you left off any of these extra parameters, it would load the table, and it still wouldn't be what you wanted.
Looking at what you've provided, there are only numerics and objects. That's perfectly reasonable. To import those to SAS is easy, numerics become floats and objects become characters (as @Tom mentioned, we only have those two types, and dates/times/datetimes are specific floats).
Clearly, col6 is a string representation of a date. But, it's not a date, it's just a character string (an object). So, to have this actually be a date (specific float value and a date format) in SAS we have to convert this column of the dataframe to a datetime64 type. That's the pandas type for date/time/datetime (which is really only a datetime). This is then where the datetimes= option comes in; when you have a pandas datetime64, but only want either the date or time part of it in SAS. So we will end up using datetimes={'col6' : 'date'} on df2sd, but not until we convert that pandas column to a datetime64 instead of an object. After that, I don't see any other cases that are an issue. Setting formats (correctly) are up to you as the rest are just numbers or strings; just specify valid format names.
So, how to convert col6 to a correct datetime64? In pandas, you can do this different ways, but here's one to try (if your dataframe was named df)
df['Col6'] = pd.to_datetime(df['Col6'])
I created a samlple df column 'n1' is like your Col6:
>>> rows = [[datetime.datetime(1965, 1, 1, 8, 0, 1), '15-Jul-19', 1.30, 'a'],
... [datetime.datetime(1966, 1, 1, 7, 0, 2), '15-Jul-19', 2.30, 'b'],
... [datetime.datetime(1967, 1, 1, 6, 0, 3), '15-Jul-19', 3.30, ' '],
... [datetime.datetime(1968, 1, 1, 5, 0, 4), '15-Jul-19', 4.30, ''],
... [None, '15-Jul-19', 5.0, 'b'],
... [None, None, None, 'b'],
... ]
>>> df = pd.DataFrame.from_records(rows, columns=['dt','n1','n2', 's1'])
>>> df
dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b
>>> df.dtypes
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object
>>>
As you can see n1 is just a string, but has dd-mmm-yyyy date in it that we read as a date.
I ran the code above and it was converted to the dateteime64 type. Now. since it's really only a date, you can use datetimes={'n2' : 'date'} on df2sd when loading this, along with outfmts={'n2' : ‘YYMMDD.’} or any valid SAS date format.
>>> df.dtypes; df; df['n1'] = pd.to_datetime(df['n1'])
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object
dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b
# after the conversion:
>>> df.dtypes; df
dt datetime64[ns]
n1 datetime64[ns]
n2 float64
s1 object
dtype: object
dt n1 n2 s1
0 1965-01-01 08:00:01 2019-07-15 1.3 a
1 1966-01-01 07:00:02 2019-07-15 2.3 b
2 1967-01-01 06:00:03 2019-07-15 3.3
3 1968-01-01 05:00:04 2019-07-15 4.3
4 NaT 2019-07-15 5.0 b
5 NaT NaT NaN b
>>>
So you see here that this is now, actually, a date (datetime) datatype and it will load into SAS as a datetime, unless you use datetimes= to say only the date or time part - which you will.
I think that's all you needed for this, was just to convert that date to a date type, then use the options when loading it into SAS.
BTW, use
print(sas.saslog()) to see the whole log and
SASdataobject.contents() to see what the dataset really is defined as (formats and data types and everything)
See how that works,
Tom
I don't see anything that shows the SAS dataset you do get.
Doesn't df2sd want valid format specifications? All of the ones you posted are missing the period that is required in a format specification. Some of them have widths that are larger than the format can specify. What the heck could possible be stored that would need to be displayed using DOLLAR format with 255 characters?
What happens if you try?
my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='trial' ,datetimes={'Col9' : 'DATE9.'} , outfmts={'Col1':'F13.' , 'Col2':'dollar32.' , 'Col3':'dollar32.' ,'Col4':'F26.5' ,'Col5':'dollar20.' ,'Col6':'F6.' , 'Col7':'dollar32.' , 'Col8':'dollar32.'})
Yes, @AshishM you are correct, valid formats certainly need to be specified. I don't manipulate what is specified in any way to try to validate or change or correct what formats you provide. You need to specify valid values. Also, looking at the log to see what was submitted and what errors you get is always the thing to do when something doesn't seem right:
print(sas.saslog())
There's also some extended doc on this specific feature, and it also shows examples of valid values being provided for this and other options options on df2sd: https://sassoftware.github.io/saspy/advanced-topics.html#advanced-sd2df-and-df2sd-techniques
Give the suggestion @AshishM showed a try and see if that works; it looks correct. And look at the log to see what code was submitted and if it worked.
Thanks,
Tom
Hi Tom,
Thanks for your suggestion, I actually referred to the same page to come up with the code posted earlier. I have another very interesting observation which I believe is a bug but probably experts can point out.
With the code I posted earlier, I observed that the data in the dataset was not populating (blank dataset). So I modified the code as below (notice the format keyword with outfmts), which is not how it should be as per the advanced page documentation and I was able to load the dataset with data however the format is still incorrect :
my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='MST_PREP',datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1':'13.', 'Col2':'dollar50.', 'Col3':'dollar35.','Col4':'26.5','Col5':'dollar20.','Col6':'6.', 'Col7':'dollar50.', 'Col8':'dollar50.'}})
The format keywork used above is actually the defined format for sas.sd2df_DISK or sd2df_CSV, instead of dsopts I used outfmts:
df = sas.sd2df_DISK('cars', 'sashelp', dtype={'invoice' : 'int'}, my_fmts=True,
dsopts={'keep' : 'MSRP Invoice', 'format' : {'msrp':'dollar32.2','invoice':'best32.'}})
This is how it should be as per documentation in advanced techniques:
sd = sas.df2sd(df, datetimes={'d' : 'date', 't' : 'time'}, outfmts={'dt' : 'comma32.4', 'd' : 'YYMMDD.', 't' : 'TIMEAMPM.'}, results='text')
Do you know how can I log a defect for the developer?
Also, regarding printing the logs, I tried the code you provided but it didn't show any errors even when the dataset was empty. is there any other way to view the logs of this method?
Hi Tom,
Thanks for the suggestion. You're right regarding DOLLAR 255 amount however reducing the number also didn't work. I tried your code and I still see the same issue.
I also added the period and it still didn't work. Not sure how to check what I am doing wrong, have already spent hours on this issue which looked simple 😞
Hi Tom,
This is the code that I am using, some of the things are sensitive so I have masked them. Basically the input folder from where I am reading the .dat files(comma separated) has 2 different kinds of files, each file has different structure i.e tables/data which is why you'll see the if else statement. I am facing the same issue for both so I had posted just one part of the code.
I am not explicitly defining any data type when I am reading from CSV(I believe it will read it as string), I am thinking of defining dtype in read csv, though it should not have been required if the format worked properly while generating the dataset. I am trying to define the format when the output SAS dataset is created, however it doesn't understand the format and defines the format based on the actual data length of the data in tables.
I am not getting any errors, if there's a way to display the error logs for df2sd method, the kindly suggest.
Dataset looks exactly the same as the data in CSV file, some of the data like column names etc is sensitive so I have masked it.
If I remove the format keyword as mentioned in last post then it results in empty dataset in SAS and since I don't know how to get logs for this method I can't view what error is thrown if it is being thrown. Python does not throw any syntax errors both ways which is surprising as this is not what the documentation shows. I am using the below method for printing errors print(my_sas_dataset_inv.HTML) but currently I don't see any errors for that.
Code:
import pandas as pd
import saspy
import os
saspy.SAScfg
sas = saspy.SASsession(cfgfile='C:\\Program Files\\Anaconda3\\lib\\site-packages\\saspy\\sascfg_personal.py')
Col_party=['Col1',' Col2', 'Col3', …..]
Col_party_inv=['Col4', 'Col5', 'Col6',...]
path='C:\\Users\\dummyuser\\Desktop\\trial'
files = os.listdir(path)
print(files[1])
for file in files:
if ".dat" in file:
print(file)
filename=path+'\\'+file
print('filename: '+filename)
if "trn_inv" in file:
Datekey=file[13:21]
print('Datekey: '+Datekey)
tablename='tbl1'+Datekey
print('tablename: '+tablename)
my_dataset_inv = pd.read_csv(filename,names=Col_party_inv,sep='|' , engine='python')
my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='TEMP',datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1:'13.', 'Col2':'dollar50.', 'Col3':'dollar35.','Col4':'26.5','Col5':'dollar20.','Col6':'6.', 'Col7':'dollar50.', 'Col8':'dollar50.'}})
my_sas_dataset_inv.head
print(my_sas_dataset_inv.HTML)
else:
Datekey=file[9:17]
print('Datekey: '+Datekey)
tablename='tbl2'+Datekey
print('tablename: '+tablename)
my_dataset = pd.read_csv(filename,names=Col_party,sep='|' , engine='python')
my_sas_dataset = sas.df2sd(my_dataset,table=tablename,libref='TEMP',datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1:'13.', 'Col2':'dollar50.', 'Col3':'dollar35.','Col4':'26.5','Col5':'dollar20.','Col6':'6.', 'Col7':'dollar50.', 'Col8':'dollar50.'}})
Output of above python code:
filename.dat
filename: C:\Users\dummyuser\Desktop\trial\filename.dat
Datekey: XXXXX
tablename: table1
0
{'LOG': "\x0c10 The SAS System 11:48 Sunday, March 22, 2020\n\n169 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;\n169 ! ods graphics on / outputfmt=png;\nNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1\n170 \n171 \n172
The SAS dataset created.Properties are still the same as posted earlier.
Note that @sastpw included instructions for how to print the SAS log from your Python code.
I haven't tried to use these things and the documentation seems skimpy to me. The links @sastpw posted mainly talk about how to use it to move data from SAS to Python.
It is not clear to me why you would want or need to attach any formats to any of the SAS variables. SAS only has two types of variables. Floating point numbers and fixed length character strings. A FORMAT in SAS is instructions on how to convert the values into text, for example when printing the value. For most things there is no need to attach any format to the SAS variables. SAS already knows how to print numbers and strings. The except is for DATE, TIME and DATETIME values. SAS stores those as number of days since 1960, seconds since midnight and seconds since 1960, respectively. saspy only knows how to convert datatime64 data type into SAS date, time or datetime values, and the you use the datetime option to tell if if you want any of them converted to date or time instead of staying as datetime.
Hi Tom,
The problem is that SAS attached character format to date variable with value 15-Jul-19. Now since it is character I am unable to change the format to date9. The following code which I am using to convert to dataset already specifies column 9 as date9. but that was ignored just like other column formats specified.
my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='Trial',datetimes={'Col9' : 'date9.'}, outfmts={'Col2:'13.', 'Col3':'dollar50.', 'Col4':'dollar35.','Col5':'26.5','Col6':'dollar20.','TIME_KEY':'6.', 'Col7:'dollar50.', 'Col8':'dollar50.'})
Looking through these posts, I still keep seeing the wrong syntax being used, which is certainly part of the problem. This isn't valid:
,datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1':'13.', 'Col2':'dollar50.', ...
datetimes doesn't take a format keyword.
datetimes – dict with column names as keys and values of ‘date’ or ‘time’ to create SAS date or times instead of datetimes
outfmts doesn't take that either.
outfmts – dict with column names and SAS formats to assign to the new SAS data set
In another post, I see something about using sd2df, but with outfmts, which isn't even valid, and trying to use dsopts, but all of this seems just seems like trying anything where there's something to do with a Format, but not specifying the correct syntax for the one case that you're trying to do.
So, as @AshishM has been trying to find out, we need to know what the columns in your dataframe are (do a dtypes and show us), and then we can help figure out a valid way to get the data imported. The syntax @AshishM posted originally, looked like the correct version of what you had tried to do to begin with. But, not knowing what's in your dataframe, and not seeing what happened when you ran that, doesn't give us anything to go on.
In the latest post I see this:
my_dataset_inv = pd.read_csv(filename,names=Col_party_inv,sep='|' , engine='python')
So then issue my_dataset_inv.dtypes and post that. Then we will at least know what datatypes you have in your dataframe. After that do my_dataset_inv.head() and we can then see what kind of values there are for these columns. Then we can figure out how to import that into SAS and specify appropriate formats for displaying those values.
Thanks,
Tom
Hi Tom,
I understand that the syntax used for datetimes and outfmts (with format keyword) I used is incorrect but surprisingly it works and populates the SAS dataset, using the syntax provided for df2sd i.e without format keyword is resulting in an empty dataset in SAS.
Please find the dtypes and values in sas dataset below:
Col1 int64
Col2 object
Col3 object
Col4 float64
Col5 object
Col6 object
Col7 int64
Col8 object
Col9 object
dtype: object
Col1 Col2 Col3 \
0 201605141 TRN201605141 PJ
1 201605142 TRN201605142 PJ
2 201605143 TRN201605143 PJ
3 201605144 TRN201605144 PJ
4 201605145 TRN201605145 PJ
Col4 Col5 Col6 Col7 \
0 1000 CREDIT 15-Jul-19 102356
1 1000 CREDIT 15-Jul-19 102356
2 1000 CREDIT 15-Jul-19 102356
3 1000 CREDIT 15-Jul-19 102356
4 1000 CREDIT 15-Jul-19 102356
Col8 Col9
0 UNK UNK
1 UNK UNK
2 UNK UNK
3 UNK UNK
4 UNK UNK
Ok, there we go. And, ok, even if it 'seemed' to work (didn't error), that doesn't mean it's right. So, lets get it right! If you left off any of these extra parameters, it would load the table, and it still wouldn't be what you wanted.
Looking at what you've provided, there are only numerics and objects. That's perfectly reasonable. To import those to SAS is easy, numerics become floats and objects become characters (as @Tom mentioned, we only have those two types, and dates/times/datetimes are specific floats).
Clearly, col6 is a string representation of a date. But, it's not a date, it's just a character string (an object). So, to have this actually be a date (specific float value and a date format) in SAS we have to convert this column of the dataframe to a datetime64 type. That's the pandas type for date/time/datetime (which is really only a datetime). This is then where the datetimes= option comes in; when you have a pandas datetime64, but only want either the date or time part of it in SAS. So we will end up using datetimes={'col6' : 'date'} on df2sd, but not until we convert that pandas column to a datetime64 instead of an object. After that, I don't see any other cases that are an issue. Setting formats (correctly) are up to you as the rest are just numbers or strings; just specify valid format names.
So, how to convert col6 to a correct datetime64? In pandas, you can do this different ways, but here's one to try (if your dataframe was named df)
df['Col6'] = pd.to_datetime(df['Col6'])
I created a samlple df column 'n1' is like your Col6:
>>> rows = [[datetime.datetime(1965, 1, 1, 8, 0, 1), '15-Jul-19', 1.30, 'a'],
... [datetime.datetime(1966, 1, 1, 7, 0, 2), '15-Jul-19', 2.30, 'b'],
... [datetime.datetime(1967, 1, 1, 6, 0, 3), '15-Jul-19', 3.30, ' '],
... [datetime.datetime(1968, 1, 1, 5, 0, 4), '15-Jul-19', 4.30, ''],
... [None, '15-Jul-19', 5.0, 'b'],
... [None, None, None, 'b'],
... ]
>>> df = pd.DataFrame.from_records(rows, columns=['dt','n1','n2', 's1'])
>>> df
dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b
>>> df.dtypes
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object
>>>
As you can see n1 is just a string, but has dd-mmm-yyyy date in it that we read as a date.
I ran the code above and it was converted to the dateteime64 type. Now. since it's really only a date, you can use datetimes={'n2' : 'date'} on df2sd when loading this, along with outfmts={'n2' : ‘YYMMDD.’} or any valid SAS date format.
>>> df.dtypes; df; df['n1'] = pd.to_datetime(df['n1'])
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object
dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b
# after the conversion:
>>> df.dtypes; df
dt datetime64[ns]
n1 datetime64[ns]
n2 float64
s1 object
dtype: object
dt n1 n2 s1
0 1965-01-01 08:00:01 2019-07-15 1.3 a
1 1966-01-01 07:00:02 2019-07-15 2.3 b
2 1967-01-01 06:00:03 2019-07-15 3.3
3 1968-01-01 05:00:04 2019-07-15 4.3
4 NaT 2019-07-15 5.0 b
5 NaT NaT NaN b
>>>
So you see here that this is now, actually, a date (datetime) datatype and it will load into SAS as a datetime, unless you use datetimes= to say only the date or time part - which you will.
I think that's all you needed for this, was just to convert that date to a date type, then use the options when loading it into SAS.
BTW, use
print(sas.saslog()) to see the whole log and
SASdataobject.contents() to see what the dataset really is defined as (formats and data types and everything)
See how that works,
Tom
BTW, there are more specific examples of this in https://sassoftware.github.io/saspy/advanced-topics.html#dates-times-and-datetimes-oh-my This explains what I said in the previous post in more detail and other examples.
Thanks,
Tom
Hi Tom,
Thanks a lot, that was really informative, I was able to load that file into SAS. Now I am stuck on the next file which has few columns that are giving issues :
1. Col1 has the value as shown below. When I try converting it to datetime my_dataset['Col1'] = pd.to_datetime(my_dataset['Col1']). I get ValueError: Unknown string format. I have shown the value in this column below.
dtype: object
Col1 object
Col2 int64
2. Col2 was identified as int64. However I want to load it as a character. How can I modify the type from numeric to object?
[Resolved]: I used the following code to cast it to object my_dataset['Col1']=my_dataset.astype({'Col1': 'object'}). Seems to be working fine.
Col1
0 20NOV2007:00:00:00
1 20NOV2007:00:00:00
2 20NOV2007:00:00:00
3 20NOV2007:00:00:00
4 20NOV2007:00:00:00
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.