Library updates

This section aims at showing the latest release of the library. We show most important releases that included new features. Library versions in between are used to fix bugs and implement improvement suggested by users’ feedback. —-

1.5.0 - September 25, 2022 - Custom cache folder in pycof.sql.remote_execute_sql() and write supports json files

PYCOF can now write dictionary to json with the pycof.misc.write() function. Users can then save dictionaries with one single line of code.

The function pycof.sql.remote_execute_sql() now supports local cache folders to save queries’ output using the argument cache_folder. The default folder will remain the temporary folder created by PYCOF, by users can also save queries in a desired folder. This will have the advantage that cached queries will not be removed after laptop/server reboot, but users will have to clean the folder manually and regularly to avoid wasted memory usage.

The function pycof.data.read() now replaces the former f_read function. This change only corresponds to a rename of the function. Its usage and arguments remain exactly the same.

Warning

Note that from version 1.6.0, the f_read will be fully deprecated and replaced by the current pycof.data.read().

How to use it?

import pycof as pc

# Write a dictionary as json file
pc.write({"file": "test json"}, '/path/to/file.json')

# Run a query and cache the output in a selected local folder
df2 = pc.remote_execute_sql('/path/to/query.sql', cache_folder='/path/to/cached/folder')

# Read a local parquet file
data = pc.read(df, '/path/to/file.parquet')

How to install it?

pip3 install pycof==1.5.0

See more details: pycof.misc.write() / pycof.sql.remote_execute_sql() / pycof.data.read()


1.3.0 - May 23, 2021 - AWS credentials profile prioritized over config file

PYCOF now prioritizes AWS cli profiles created through the aws configure command over config.json file. For functions pycof.data.f_read() and pycof.misc.write(), users no longer to create the config.json file. Only requirement will be to run the command aws configure and register the IAM access and private keys. For the function pycof.sql.remote_execute_sql(), the config.json file may remain required for the case where connection=’IAM’ to connect to a Redshift cluster. The fallback solution with config.json file containing the fields AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY still remains available but may be deprecated in later versions.

This change will allow faster setup and even no setup required on AWS environments (e.g. EC2, SageMaker).

How to use it?

import pycof as pc

# Load a parquet file from Amazon S3
df = pc.f_read('s3://bucket/path/to/file.parquet', profile_name='default')

# Write a file on Amazon S3
pc.write(df, 's3://bucket/path/to/file2.parquet', profile_name='default')

# Run a query on a Redshift cluster
df2 = pc.remote_execute_sql('/path/to/query.sql', connection='IAM', profile_name='default')

How to install it?

pip3 install pycof==1.3.0

See more details: pycof.data.f_read() / pycof.misc.write() / pycof.sql.remote_execute_sql()


1.2.5 - February 2, 2021 - Emails and Google Calendar are now supported

A new module pycof.format.GoogleCalendar() allows users to retreive events from Google Calendar. The modules contains a fonction pycof.format.GoogleCalendar.today_events() to get all events of the running day. The user can also use pycof.format.GoogleCalendar.next_events() to find the next events (the number of events is passed in the arguments).

Another module pycof.format.GetEmails() allows users to retreive the most recent emails from a selected address. Users can retreive a fixed number of emails and their attachments.

An additional namespace is available in the output of pycof.sql.remote_execute_sql(). Metadata have been added when the cache is called and allow users to have information regarding the cache in place, the last run date of the query, the file age, etc…

How to use it?

import pycof as pc

calendar = pc.GoogleCalendar()
# Get today events
todays = calendar.today_events(calendar='primary')
# Get 10 next events
next10 = calendar.next_events(calendar='primary', maxResults=10)

# Retreive last 10 emails
pycof.GetEmails(10)

# Check file age of an SQL output
df = pc.remote_execute_sql(sql, cache='2h')
df.meta.cache.age()

How to install it?

pip3 install pycof==1.2.5

See more details: pycof.format.GoogleCalendar() / pycof.format.GetEmails() / pycof.sql.remote_execute_sql()


1.2.0 - December 13, 2020 - SSH tunnels supported in pycof.sql.remote_execute_sql()

The module pycof.sql.remote_execute_sql() now supports remote connections with SSH tunneling thanks to the argument connection='SSH'. Supported for both MySQL and SQLite databases, users will be able to access databases on servers that only expose port 22. This will allow more secure connections. If argument connection='SSH' is called but the config file does not have neither a value for SSH_KEY nor for SSH_PASSWORD, the function will look for the default SSH location (/home/user/.ssh/id_rsa on Linux/MacOS or 'C://Users/<username>/.ssh/id_rsa on Windows).

Also, both functions pycof.sql.remote_execute_sql() and pycof.data.f_read() can consume argument credentials without ‘.json’ extension. See SQL FAQ 6 for more details.

Warning

Note that from version 1.2.0, the pycof credentials folder for Linux and MacOS will need to be /etc/.pycof. You can then move you config file with the command: sudo mv /etc/config.json /etc/.pycof/config.json.

The adapted config.json structure is:

{
"DB_USER": "",
"DB_PASSWORD": "",
"DB_HOST": "",
"DB_PORT": "3306",
"DB_DATABASE": "",
"SSH_USER": ""
}

Other arguments such as SSH_KEY and SSH_PASSWORD are optional provided that the SSH key is stored in the default folder.

How to use it?

import pycof as pc

pc.remote_execute_sql('my_example.sql', connection='SSH')

How to install it?

pip3 install pycof==1.2.0

See more details: pycof.sql.remote_execute_sql()


1.1.37 - September 30, 2020 - SQLite database on pycof.sql.remote_execute_sql()

The module pycof.sql.remote_execute_sql() now supports local SQLite connections. Extending from MySQL and AWS Redshift databases, users can now work with local databases thanks to SQLite. This will allow users to play with infrastructure running on their local machine (overcoming the problem of remote servers and potential cost infrastructure).

The adapted config.json structure is:

{
"DB_USER": "",
"DB_PASSWORD": "",
"DB_HOST": "/path/to/sqlite.db",
"DB_PORT": "sqlite3",
"DB_DATABASE": "",
}

The module will automatically detect the connection if the keyword sqlite appears in the path to the database. User can also define the port as sqlite if the path does not contain the keyword. A final option is given to force the connection with the argument engine='sqlite3'.

The module will offer the same functionality as the first two connectors.

How to use it?

import pycof as pc

pc.remote_execute_sql('my_example.sql', engine='sqlite3')

How to install it?

pip3 install pycof==1.1.37

See more details: pycof.sql.remote_execute_sql()


1.1.35 - September 13, 2020 - Connector engine added to pycof.sql.remote_execute_sql()

The module pycof.sql.remote_execute_sql() automaticaly detects a redshift cluster. The logic consists in checking whether the keyword redshift is contained in the hostname of the AWS Redshift cluster.

The module now includes an argument engine which allows to force the Redshift connector. If you need another engine (neither Redshift nor MySQL), please submit an issue.

Warning

The module datamngt which contained OneHotEncoding() and create_dataset() is now deprecated. To use these modules, please refer to statinf.

How to use it?

import pycof as pc

pc.remote_execute_sql('my_example.sql', engine='redshift')

How to install it?

pip3 install pycof==1.1.35

See more details: pycof.sql.remote_execute_sql()


1.1.33 - May 17, 2020 - Improved query experience with pycof.sql.remote_execute_sql()

We improved querying experience in pycof.sql.remote_execute_sql() by simplifying the argument cache_time and by allowing an sql_query as a path.

Usage of argument cache_time has been improved by allowing users to provide a string with units (e.g. 24h, 1.3mins). Users still have the possibility to provide an integer representing file age in seconds.

pycof.sql.remote_execute_sql() also now accepts a path for sql_query. The extension needs to be sql. The path will then be passed to pycof.data.f_read() to recover the SQL query.

Warning

The module datamngt which contains OneHotEncoding() and create_dataset() will be moved to statinf.

How to use it?

import pycof as pc

pc.remote_execute_sql('my_example.sql', cache=True, cache_time='2.3wk')

How to install it?

pip3 install pycof==1.1.33

See more details: pycof.sql.remote_execute_sql()


1.1.26 - Mar 20, 2020 - pycof.data.f_read() now supports json and parquet

We extended the pycof.data.f_read() extension capabilities to include json and parquet formats. It aims at loading files to be used as DataFrame or SQL files. The formats accepted now are: csv, txt, xlsx, sql, json, parquet, js, html.

Warning

The recommended engine is pyarrow since fastparquet has stability and installation issues. The dependency on fastparquet will be removed in version 1.1.30.

How to use it?

import pycof as pc

pc.f_read('example_df.json')

How to install it?

pip3 install pycof==1.1.24

See more details: pycof.data.f_read()


1.1.21 - Feb 21, 2020 - New function pycof.data.f_read()

PYCOF now provides a function to load files without having to care about the extension. It aims at loading files to be used as DataFrame or SQL files. The formats accepted are: csv, txt, xlsx, sql Soon it will be extended to json, parquet, js, html.

How to use it?

import pycof as pc

pc.f_read('example_df.csv')

How to install it?

pip3 install pycof==1.1.21

See more details: pycof.data.f_read()


1.1.13 - Dec 21, 2019 - New function pycof.send_email()

PYCOF allows to send email from a script with an easy function. No need to handle SMTP connector, PYCOF does it for you. The only requirement is the file config.json to be setup once. See more setup details.

How to use it?

import pycof as pc

pc.send_email(to="test@domain.com", body="Hello world!", subject="Test")

How to install it?

pip3 install pycof==1.1.13

See more details: pycof.send_email()


1.1.11 - Dec 10, 2019 - pycof.sql.remote_execute_sql() now supports caching

pycof.sql.remote_execute_sql() can now cache your SELECT results. This will avoid querying the database several times when executing the command multiple times. The function will save the file in a temporary file by hasing your SQL query. See more details.

How to use it?

.. code::

import pycof as pc

sql = """
SELECT *
FROM schema.table
"""

pc.remote_execute_sql(sql, cache=True, cache_time=3600)

How to install it?

pip3 install pycof==1.1.11

See more details: pycof.sql.remote_execute_sql()


1.1.9 - Nov 23, 2019 - pycof.sql.remote_execute_sql() now supports COPY

pycof.sql.remote_execute_sql() can now execute COPY commands on top of SELECT, INSERT and DELETE. The only requirement is the file config.json to bet setup once. See more setup details.

How to use it?

import pycof as pc

sql_copy = """
COPY FROM schema.table -
CREATE SCIENTISTS (EMPLOYEE_ID, EMAIL) -
USING SELECT EMPLOYEE_ID, EMAIL FROM EMPLOYEES -
WHERE JOB_ID='SCIENTIST';
"""

pc.remote_execute_sql(sql_copy, useIAM=True)

How to install it?

pip3 install pycof==1.1.9

See more details: pycof.sql.remote_execute_sql()


1.1.5 - Nov 15, 2019 - pycof.sql.remote_execute_sql() now supprots IAM credentials

You can now connect to your database though IAM. The only requirement is the file config.json to bet setup once. See more setup details and more information for this feature.

How to use it?

import pycof as pc

sql = """
SELECT *
FROM schema.table
"""

pc.remote_execute_sql(sql, useIAM=True)

How to install it?

pip3 install pycof==1.1.5

See more details: pycof.sql.remote_execute_sql()