BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AshishM
Fluorite | Level 6
Hi Experts,
I have tried everything but it doesn't seem to work as expected. I am using saspy df2sd method to create a sasdataset from dataframe and want the columns in sas dataset output to be in the format shown below. I have tried everything in the examples but can't get it to work.
Below is the statement that I am executing, it is executed successfully and dataset is created however the format is not as per the one defined below. I end up getting a dataset as shown below with incorrect types. can someone please suggest what am I doing wrong.
 
my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='trial',datetimes={'Col9' : 'DATE9'}, outfmts={'Col1':'13', 'Col2':'dollar50', 'Col3':'dollar35','Col4':'26.5','Col5':'dollar20','Col6':'6', 'Col7':'dollar255', 'Col8':'dollar50'})
 
 

Thanks in Advance
1 ACCEPTED SOLUTION

Accepted Solutions
sastpw
SAS Employee

Ok, there we go. And, ok, even if it 'seemed' to work (didn't error), that doesn't mean it's right. So, lets get it right! If you left off any of these extra parameters, it would load the table, and it still wouldn't be what you wanted.

 

Looking at what you've provided, there are only numerics and objects. That's perfectly reasonable. To import those to SAS is easy, numerics become floats and objects become characters (as @Tom mentioned, we only have those two types, and dates/times/datetimes are specific floats).

 

Clearly, col6 is a string representation of a date. But, it's not a date, it's just a character string (an object). So, to have this actually be a date (specific float value and a date format) in SAS we have to convert this column of the dataframe to a datetime64 type. That's the pandas type for date/time/datetime (which is really only a datetime). This is then where the datetimes= option comes in; when you have a pandas datetime64, but only want either the date or time part of it in SAS. So we will end up using datetimes={'col6' : 'date'} on df2sd, but not until we convert that pandas column to a datetime64 instead of an object. After that, I don't see any other cases that are an issue. Setting formats (correctly) are up to you as the rest are just numbers or strings; just specify valid format names.

 

So, how to convert col6 to a correct datetime64? In pandas, you can do this different ways, but here's one to try (if your dataframe was named df) 

 

df['Col6'] = pd.to_datetime(df['Col6'])

 

I created a samlple df  column 'n1' is like your Col6:

 

>>> rows = [[datetime.datetime(1965, 1, 1, 8, 0, 1), '15-Jul-19', 1.30, 'a'],
... [datetime.datetime(1966, 1, 1, 7, 0, 2), '15-Jul-19', 2.30, 'b'],
... [datetime.datetime(1967, 1, 1, 6, 0, 3), '15-Jul-19', 3.30, ' '],
... [datetime.datetime(1968, 1, 1, 5, 0, 4), '15-Jul-19', 4.30, ''],
... [None, '15-Jul-19', 5.0, 'b'],
... [None, None, None, 'b'],
... ]
>>> df = pd.DataFrame.from_records(rows, columns=['dt','n1','n2', 's1'])
>>> df
dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b
>>> df.dtypes
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object
>>>

 

As you can see n1 is just a string, but has dd-mmm-yyyy date in it that we read as a date.

I ran the code above and it was converted to the dateteime64 type. Now. since it's really only a date, you can use datetimes={'n2' : 'date'} on df2sd when loading this, along with outfmts={'n2' :  ‘YYMMDD.’}  or any valid SAS date format.

 

>>> df.dtypes;    df;   df['n1'] = pd.to_datetime(df['n1'])
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object


dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b



# after the conversion:

>>> df.dtypes; df
dt datetime64[ns]
n1 datetime64[ns]
n2 float64
s1 object
dtype: object


dt n1 n2 s1
0 1965-01-01 08:00:01 2019-07-15 1.3 a
1 1966-01-01 07:00:02 2019-07-15 2.3 b
2 1967-01-01 06:00:03 2019-07-15 3.3
3 1968-01-01 05:00:04 2019-07-15 4.3
4 NaT 2019-07-15 5.0 b
5 NaT NaT NaN b
>>>

 

So you see here that this is now, actually, a date (datetime) datatype and it will load into SAS as a datetime, unless you use datetimes= to say only the date or time part - which you will.

 

I think that's all you needed for this, was just to convert that date to a date type, then use the options when loading it into SAS. 

 

BTW, use 

print(sas.saslog()) to see the whole log and

SASdataobject.contents() to see what the dataset really is defined as (formats and data types and everything)

 

See how that works,

Tom

 

View solution in original post

18 REPLIES 18
Tom
Super User Tom
Super User

I don't see anything that shows the SAS dataset you do get.  

Doesn't df2sd want valid format specifications?  All of the ones you posted are missing the period that is required in a format specification.  Some of them have widths that are larger than the format can specify. What the heck could possible be stored that would need to be displayed using DOLLAR format with 255 characters?

 

What happens if you try?

my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='trial'
,datetimes={'Col9' : 'DATE9.'}
, outfmts={'Col1':'F13.'
, 'Col2':'dollar32.'
, 'Col3':'dollar32.'
,'Col4':'F26.5'
,'Col5':'dollar20.'
,'Col6':'F6.'
, 'Col7':'dollar32.'
, 'Col8':'dollar32.'})
sastpw
SAS Employee

Yes, @AshishM you are correct, valid formats certainly need to be specified. I don't manipulate what is specified in any way to try to validate or change or correct what formats you provide. You need to specify valid values. Also, looking at the log to see what was submitted and what errors you get is always the thing to do when something doesn't seem right:

 

print(sas.saslog())

 

There's also some extended doc on this specific feature, and it also shows examples of valid values being provided for this and other options options on df2sd: https://sassoftware.github.io/saspy/advanced-topics.html#advanced-sd2df-and-df2sd-techniques 

 

Give the suggestion @AshishM showed a try and see if that works; it looks correct. And look at the log to see what code was submitted and if it worked. 

 

Thanks,

Tom

AshishM
Fluorite | Level 6

Hi Tom,

Thanks for your suggestion, I actually referred to the same page to come up with the code posted earlier. I have another very interesting observation which I believe is a bug but probably experts can point out.

With the code I posted earlier, I observed that the data in the dataset was not populating (blank dataset). So I modified the code as below (notice the format keyword with outfmts), which is not how it should be as per the advanced page documentation and I was able to load the dataset with data however the format is still incorrect :

 

my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='MST_PREP',datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1':'13.', 'Col2':'dollar50.', 'Col3':'dollar35.','Col4':'26.5','Col5':'dollar20.','Col6':'6.', 'Col7':'dollar50.', 'Col8':'dollar50.'}})

 

The format keywork used above is actually the defined format for sas.sd2df_DISK or sd2df_CSV, instead of dsopts I used outfmts:

df = sas.sd2df_DISK('cars', 'sashelp', dtype={'invoice' : 'int'}, my_fmts=True,
dsopts={'keep' : 'MSRP Invoice', 'format' : {'msrp':'dollar32.2','invoice':'best32.'}})

 

This is how it should be as per documentation in advanced techniques:

sd = sas.df2sd(df, datetimes={'d' : 'date', 't' : 'time'}, outfmts={'dt' : 'comma32.4', 'd' : 'YYMMDD.', 't' : 'TIMEAMPM.'}, results='text')

 

Do you know how can I log a defect for the developer?

Also, regarding printing the logs, I tried the code you provided but it didn't show any errors even when the dataset was empty. is there any other way to view the logs of this method?

AshishM
Fluorite | Level 6

Hi Tom,

Thanks for the suggestion. You're right regarding DOLLAR 255 amount however reducing the number also didn't work. I tried your code and I still see the same issue.

I also added the period and it still didn't work. Not sure how to check what I am doing wrong, have already spent hours on this issue which looked simple 😞

Tom
Super User Tom
Super User
You still haven't shown any details. What is the data frame you are trying to transfer? How are Col1 and Col9 and the other variables defined in python? Are they numbers? Strings? something else? Do you get error messages in python? What does the SAS log from the SAS session that saspy is using show? If it makes a SAS dataset then what does that dataset look like?
AshishM
Fluorite | Level 6

Hi Tom,

This is the code that I am using, some of the things are sensitive so I have masked them. Basically the input folder from where I am reading the .dat files(comma separated) has 2 different kinds of files, each file has different structure i.e tables/data which is why you'll see the if else statement. I am facing the same issue for both so I had posted just one part of the code.

I am not explicitly defining any data type when I am reading from CSV(I believe it will read it as string), I am thinking of defining dtype in read csv, though it should not have been required if the format worked properly while generating the dataset. I am trying to define the format when the output SAS dataset is created, however it doesn't understand the format and defines the format based on the actual data length of the data in tables.

I am not getting any errors, if there's a way to display the error logs for df2sd method, the kindly suggest.

Dataset looks exactly the same as the data in CSV file, some of the data like column names etc is sensitive so I have masked it.

If I remove the format keyword as mentioned in last post then it results in empty dataset in SAS and since I don't know how to get logs for this method I can't view what error is thrown if it is being thrown. Python does not throw any syntax errors both ways which is surprising as this is not what the documentation shows. I am using the below method for printing errors print(my_sas_dataset_inv.HTML) but currently I don't see any errors for that.

 

Code:

import pandas as pd
import saspy
import os

saspy.SAScfg
sas = saspy.SASsession(cfgfile='C:\\Program Files\\Anaconda3\\lib\\site-packages\\saspy\\sascfg_personal.py')

Col_party=['Col1',' Col2', 'Col3', …..]
Col_party_inv=['Col4', 'Col5', 'Col6',...]

path='C:\\Users\\dummyuser\\Desktop\\trial'
files = os.listdir(path)
print(files[1])

for file in files:
    if ".dat" in file:
        print(file)
        filename=path+'\\'+file
        print('filename: '+filename)
        if "trn_inv" in file:
            Datekey=file[13:21]
            print('Datekey: '+Datekey)
            tablename='tbl1'+Datekey
            print('tablename: '+tablename)
            my_dataset_inv = pd.read_csv(filename,names=Col_party_inv,sep='|' , engine='python')
            my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='TEMP',datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1:'13.', 'Col2':'dollar50.', 'Col3':'dollar35.','Col4':'26.5','Col5':'dollar20.','Col6':'6.', 'Col7':'dollar50.', 'Col8':'dollar50.'}})
            my_sas_dataset_inv.head
            print(my_sas_dataset_inv.HTML)
        else:
            Datekey=file[9:17]
            print('Datekey: '+Datekey)
            tablename='tbl2'+Datekey
            print('tablename: '+tablename)
            my_dataset = pd.read_csv(filename,names=Col_party,sep='|' , engine='python')

             my_sas_dataset = sas.df2sd(my_dataset,table=tablename,libref='TEMP',datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1:'13.', 'Col2':'dollar50.',       'Col3':'dollar35.','Col4':'26.5','Col5':'dollar20.','Col6':'6.', 'Col7':'dollar50.', 'Col8':'dollar50.'}})

 

 

Output of above python code:

filename.dat
filename: C:\Users\dummyuser\Desktop\trial\filename.dat

Datekey: XXXXX
tablename: table1

0

{'LOG': "\x0c10                                                         The SAS System                               11:48 Sunday, March 22, 2020\n\n169        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;\n169      ! ods graphics on / outputfmt=png;\nNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1\n170        \n171        \n172

 

The SAS dataset created.Properties are still the same as posted earlier.

Dataset.png

Tom
Super User Tom
Super User

Note that @sastpw included instructions for how to print the SAS log from your Python code.

 

I haven't tried to use these things and the documentation seems skimpy to me.  The links @sastpw posted mainly talk about how to use it to move data from SAS to Python.

 

It is not clear to me why you would want or need to attach any formats to any of the SAS variables.  SAS only has two types of variables.  Floating point numbers and fixed length character strings.  A FORMAT in SAS is instructions on how to convert the values into text, for example when printing the value.  For most things there is no need to attach any format to the SAS variables.  SAS already knows how to print numbers and strings.  The except is for DATE, TIME and DATETIME values.  SAS stores those as number of days since 1960, seconds since midnight and seconds since 1960, respectively.  saspy only knows how to convert datatime64 data type into SAS date, time or datetime values, and the you use the datetime option to tell if if you want any of them converted to date or time instead of staying as datetime.

AshishM
Fluorite | Level 6

Hi Tom,

The problem is that SAS attached character format to date variable with value 15-Jul-19. Now since it is character I am unable to change the format to date9. The following code which I am using to convert to dataset already specifies column 9 as date9. but that was ignored just like other column formats specified.

 

my_sas_dataset_inv = sas.df2sd(my_dataset_inv,table=tablename,libref='Trial',datetimes={'Col9' : 'date9.'}, outfmts={'Col2:'13.', 'Col3':'dollar50.', 'Col4':'dollar35.','Col5':'26.5','Col6':'dollar20.','TIME_KEY':'6.', 'Col7:'dollar50.', 'Col8':'dollar50.'})

sastpw
SAS Employee

Looking through these posts, I still keep seeing the wrong syntax being used, which is certainly part of the problem. This isn't valid:

,datetimes={'format':{'Col9' : 'date9.'}}, outfmts={'format':{'Col1':'13.', 'Col2':'dollar50.', ...

 

datetimes doesn't take a format keyword.

datetimes – dict with column names as keys and values of ‘date’ or ‘time’ to create SAS date or times instead of datetimes

 

outfmts doesn't take that either.

outfmts – dict with column names and SAS formats to assign to the new SAS data set

 

In another post, I see something about using sd2df, but with outfmts, which isn't even valid, and trying to use dsopts, but all of this seems just seems like trying anything where there's something to do with a Format, but not specifying the correct syntax for the one case that you're trying to do.

 

So, as @AshishM has been trying to find out, we need to know what the columns in your dataframe are (do a dtypes and show us), and then we can help figure out a valid way to get the data imported. The syntax @AshishM posted originally, looked like the correct version of what you had tried to do to begin with. But, not knowing what's in your dataframe, and not seeing what happened when you ran that, doesn't give us anything to go on. 

 

In the latest post I see this:

my_dataset_inv = pd.read_csv(filename,names=Col_party_inv,sep='|' , engine='python')

 

So then issue my_dataset_inv.dtypes and post that. Then we will at least know what datatypes you have in your dataframe. After that do my_dataset_inv.head() and we can then see what kind of values there are for these columns. Then we can figure out how to import that into SAS and specify appropriate formats for displaying those values. 

 

Thanks,

Tom

 

 

 

sastpw
SAS Employee

My apologies, I've been using the tag @AshishM when I should be using @Tom . I got those backwards. Sorry about that.

AshishM
Fluorite | Level 6

Hi Tom,

I understand that the syntax used for datetimes and outfmts (with format keyword) I used is incorrect but surprisingly it works and populates the SAS dataset, using the syntax provided for df2sd i.e without format keyword is resulting in an empty dataset in SAS.

 

Please find the dtypes and values in sas dataset below:

 

Col1     int64
Col2     object
Col3     object
Col4     float64
Col5     object
Col6     object
Col7     int64
Col8     object
Col9     object
dtype: object
   Col1 Col2 Col3  \
0        201605141                 TRN201605141                     PJ  
1        201605142                 TRN201605142                     PJ  
2        201605143                 TRN201605143                     PJ  
3        201605144                 TRN201605144                     PJ  
4        201605145                 TRN201605145                     PJ  

   Col4 Col5 Col6  Col7  \
0                        1000               CREDIT       15-Jul-19    102356  
1                        1000               CREDIT       15-Jul-19    102356  
2                        1000               CREDIT       15-Jul-19    102356  
3                        1000               CREDIT       15-Jul-19    102356  
4                        1000               CREDIT       15-Jul-19    102356  

  Col8 Col9 
0               UNK                   UNK 
1               UNK                   UNK 
2               UNK                   UNK 
3               UNK                   UNK 
4               UNK                   UNK 

sastpw
SAS Employee

Ok, there we go. And, ok, even if it 'seemed' to work (didn't error), that doesn't mean it's right. So, lets get it right! If you left off any of these extra parameters, it would load the table, and it still wouldn't be what you wanted.

 

Looking at what you've provided, there are only numerics and objects. That's perfectly reasonable. To import those to SAS is easy, numerics become floats and objects become characters (as @Tom mentioned, we only have those two types, and dates/times/datetimes are specific floats).

 

Clearly, col6 is a string representation of a date. But, it's not a date, it's just a character string (an object). So, to have this actually be a date (specific float value and a date format) in SAS we have to convert this column of the dataframe to a datetime64 type. That's the pandas type for date/time/datetime (which is really only a datetime). This is then where the datetimes= option comes in; when you have a pandas datetime64, but only want either the date or time part of it in SAS. So we will end up using datetimes={'col6' : 'date'} on df2sd, but not until we convert that pandas column to a datetime64 instead of an object. After that, I don't see any other cases that are an issue. Setting formats (correctly) are up to you as the rest are just numbers or strings; just specify valid format names.

 

So, how to convert col6 to a correct datetime64? In pandas, you can do this different ways, but here's one to try (if your dataframe was named df) 

 

df['Col6'] = pd.to_datetime(df['Col6'])

 

I created a samlple df  column 'n1' is like your Col6:

 

>>> rows = [[datetime.datetime(1965, 1, 1, 8, 0, 1), '15-Jul-19', 1.30, 'a'],
... [datetime.datetime(1966, 1, 1, 7, 0, 2), '15-Jul-19', 2.30, 'b'],
... [datetime.datetime(1967, 1, 1, 6, 0, 3), '15-Jul-19', 3.30, ' '],
... [datetime.datetime(1968, 1, 1, 5, 0, 4), '15-Jul-19', 4.30, ''],
... [None, '15-Jul-19', 5.0, 'b'],
... [None, None, None, 'b'],
... ]
>>> df = pd.DataFrame.from_records(rows, columns=['dt','n1','n2', 's1'])
>>> df
dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b
>>> df.dtypes
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object
>>>

 

As you can see n1 is just a string, but has dd-mmm-yyyy date in it that we read as a date.

I ran the code above and it was converted to the dateteime64 type. Now. since it's really only a date, you can use datetimes={'n2' : 'date'} on df2sd when loading this, along with outfmts={'n2' :  ‘YYMMDD.’}  or any valid SAS date format.

 

>>> df.dtypes;    df;   df['n1'] = pd.to_datetime(df['n1'])
dt datetime64[ns]
n1 object
n2 float64
s1 object
dtype: object


dt n1 n2 s1
0 1965-01-01 08:00:01 15-Jul-19 1.3 a
1 1966-01-01 07:00:02 15-Jul-19 2.3 b
2 1967-01-01 06:00:03 15-Jul-19 3.3
3 1968-01-01 05:00:04 15-Jul-19 4.3
4 NaT 15-Jul-19 5.0 b
5 NaT None NaN b



# after the conversion:

>>> df.dtypes; df
dt datetime64[ns]
n1 datetime64[ns]
n2 float64
s1 object
dtype: object


dt n1 n2 s1
0 1965-01-01 08:00:01 2019-07-15 1.3 a
1 1966-01-01 07:00:02 2019-07-15 2.3 b
2 1967-01-01 06:00:03 2019-07-15 3.3
3 1968-01-01 05:00:04 2019-07-15 4.3
4 NaT 2019-07-15 5.0 b
5 NaT NaT NaN b
>>>

 

So you see here that this is now, actually, a date (datetime) datatype and it will load into SAS as a datetime, unless you use datetimes= to say only the date or time part - which you will.

 

I think that's all you needed for this, was just to convert that date to a date type, then use the options when loading it into SAS. 

 

BTW, use 

print(sas.saslog()) to see the whole log and

SASdataobject.contents() to see what the dataset really is defined as (formats and data types and everything)

 

See how that works,

Tom

 

sastpw
SAS Employee

BTW, there are more specific examples of this in https://sassoftware.github.io/saspy/advanced-topics.html#dates-times-and-datetimes-oh-my This explains what I said in the previous post in more detail and other examples.

 

Thanks,

Tom

AshishM
Fluorite | Level 6

Hi Tom,

Thanks a lot, that was really informative, I was able to load that file into SAS. Now I am stuck on the next file which has few columns that are giving issues :

1. Col1 has the value as shown below. When I try converting it to datetime my_dataset['Col1'] = pd.to_datetime(my_dataset['Col1']). I get ValueError: Unknown string format. I have shown the value in this column below.

dtype: object

Col1                object
Col2                int64

2. Col2 was identified as int64. However I want to load it as a character. How can I modify the type from numeric to object?

[Resolved]: I used the following code to cast it to object my_dataset['Col1']=my_dataset.astype({'Col1': 'object'}). Seems to be working fine.

 

Col1

0 20NOV2007:00:00:00                    
1  20NOV2007:00:00:00                      
2  20NOV2007:00:00:00                      
3  20NOV2007:00:00:00                      
4  20NOV2007:00:00:00

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 7040 views
  • 6 likes
  • 3 in conversation