libera_utils.io.smart_open#

Module for smart_open

Functions

is_gzip(path)

Determine if a string points to an gzip file.

is_s3(path)

Determine if a string points to an s3 location or not.

smart_copy_file(source_path, dest_path[, delete])

Copy function that can handle local files or files in an S3 bucket.

smart_open(path[, mode, enable_gzip])

Open function that can handle local files or files in an S3 bucket.

libera_utils.io.smart_open._copy_local_to_local(source_path: str, dest_path: str, delete: bool)#

Copy a local source file to a local destination.

Parameters:
  • source_path (str or pathlib.Path) – Path to the source file to be copied.

  • dest_path (str or pathlib.Path) – Path to the destination for the copied file.

  • delete (bool) – If true, deletes files copied from source (default = False)

Returns:

The path to the newly created file

Return type:

pathlib.Path

libera_utils.io.smart_open._copy_local_to_s3(source_path: str, dest_path: str, delete: bool)#

Copy a local file to an S3 object.

Parameters:
  • source_path (str or pathlib.Path) – Path to the source file to be copied.

  • dest_path (str or cloudpathlib.s3.s3path.S3Path) – Path to the destination for the copied file. Files residing in an s3 bucket must begin with “s3://”.

  • delete (bool) – If true, deletes files copied from source (default = False)

Returns:

The path to the newly created file

Return type:

cloudpathlib.s3.s3path.S3Path

libera_utils.io.smart_open._copy_s3_to_local(source_path: str, dest_path: str, delete: bool)#

Copy an S3 object to a local file.

Parameters:
  • source_path (str or cloudpathlib.s3.s3path.S3Path) – Path to the source file to be copied. Files residing in an s3 bucket must begin with “s3://”.

  • dest_path (str or pathlib.Path) – Path to the destination for the copied file.

  • delete (bool) – If true, deletes files copied from source (default = False)

Returns:

The path to the newly created file

Return type:

pathlib.Path

libera_utils.io.smart_open._copy_s3_to_s3(source_path: str, dest_path: str, delete: bool)#

Copy an S3 object to a different S3 object.

Parameters:
  • source_path (str or cloudpathlib.s3.s3path.S3Path) – Path to the source file to be copied. Files residing in an s3 bucket must begin with “s3://”.

  • dest_path (str or cloudpathlib.s3.s3path.S3Path) – Path to the Destination file to be copied to. Files residing in an s3 bucket must begin with “s3://”.

  • delete (bool) – If true, deletes files copied from source (default = False)

Returns:

The path to the newly created file

Return type:

cloudpathlib.s3.s3path.S3Path

libera_utils.io.smart_open.is_gzip(path: str)#

Determine if a string points to an gzip file.

Parameters:

path (str or pathlib.Path or cloudpathlib.s3.s3path.S3Path) – Path to check.

Return type:

bool

libera_utils.io.smart_open.is_s3(path: str)#

Determine if a string points to an s3 location or not.

Parameters:

path (str or pathlib.Path or cloudpathlib.s3.s3path.S3Path) – Path to determine if it is and s3 location or not.

Return type:

bool

libera_utils.io.smart_open.smart_copy_file(source_path: str, dest_path: str, delete=False)#

Copy function that can handle local files or files in an S3 bucket. Returns the path to the newly created file as a Path or an S3Path, depending on the destination.

Parameters:
Returns:

The path to the newly created file

Return type:

pathlib.Path or cloudpathlib.s3.s3path.S3Path

libera_utils.io.smart_open.smart_open(path: str, mode: str = 'rb', enable_gzip: bool = True)#

Open function that can handle local files or files in an S3 bucket. It also correctly handles gzip files determined by a *.gz extension.

Parameters:
  • path (str or pathlib.Path or cloudpathlib.s3.s3path.S3Path) – Path to the file to be opened. Files residing in an s3 bucket must begin with “s3://”.

  • mode (str, Optional) – Optional string specifying the mode in which the file is opened. Defaults to ‘rb’.

  • enable_gzip (bool, Optional) – Flag to specify that *.gz files should be opened as a GzipFile object. Setting this to False is useful when creating the md5sum of a *.gz file. Defaults to True.

Return type:

IO or gzip.GzipFile