Title: | A client for the dolt database |
---|---|
Description: | Creates a DBI-compliant interface to dolt databases (<https://www.dolthub.com>). Also manages local dolt server processes, provides convenience functions for dolt versioning, and an RStudio connection pane interface. |
Authors: | Noam Ross [aut, cre] , EcoHealth Alliance [cph] |
Maintainer: | Noam Ross <[email protected]> |
License: | AGPL (>= 3) |
Version: | 0.0.0.9000 |
Built: | 2024-10-16 03:31:10 UTC |
Source: | https://github.com/ecohealthalliance/doltr |
dbDataType
matches R data types to dolt data types. For text and blob data
types, it automatically chooses amongst options (e.g., VARCHAR(N),
TEXT,
LONGTEXT, etc.
) based on maximum field length. An attribute of maximum
size of these fields is returned to support operations where fields need to be recast in revision.
## S4 method for signature 'DoltConnection' dbDataType( dbObj, obj, min_varchar = Sys.getenv("DOLT_MINVARCAHR", 255L), max_varchar = Sys.getenv("DOLT_MAXVARCHAR", 16383L), ... ) dolt_type_sizes(types)
## S4 method for signature 'DoltConnection' dbDataType( dbObj, obj, min_varchar = Sys.getenv("DOLT_MINVARCAHR", 255L), max_varchar = Sys.getenv("DOLT_MAXVARCHAR", 16383L), ... ) dolt_type_sizes(types)
dbObj |
the database connection |
obj |
the data type (vector or data frame) |
min_varchar |
The minimum size |
max_varchar |
the maximum size |
... |
further arguments to methods |
types |
a character vector of dolt types, e.g., |
dolt_type_sizes()
takes a vector of SQL types and returns the maximum field
size, if applicable.
A character vector of classes, with attributes of the maximum size for text and blob classes
The Dolt dbGetInfo()
returns standard information about a database connection
according to the DBI specification, as well as information
about the version-control status of the repository such as current branch, last
commit, and modified tables. This information is
also displayed in the print method for a dolt connection object and in the
RStudio connection pane.
## S4 method for signature 'DoltConnection' dbGetInfo(dbObj, ...) ## S4 method for signature 'DoltConnection' show(object)
## S4 method for signature 'DoltConnection' dbGetInfo(dbObj, ...) ## S4 method for signature 'DoltConnection' show(object)
dbObj |
the database connection |
... |
Other arguments to methods |
object |
the database connection |
dolt_state dolt_status dolt_last_commit dolt_pane
These methods largely wrap RMariaDB
methods with small tweaks to work with
Dolt databases.
## S4 method for signature 'DoltConnection,character' dbSendQuery(conn, statement, params = NULL, ...) ## S4 method for signature 'DoltConnection,character' dbSendStatement(conn, statement, params = NULL, ...) ## S4 method for signature 'DoltResult' dbClearResult(res, ...) ## S4 method for signature 'DoltConnection' dbDisconnect(conn, ...)
## S4 method for signature 'DoltConnection,character' dbSendQuery(conn, statement, params = NULL, ...) ## S4 method for signature 'DoltConnection,character' dbSendStatement(conn, statement, params = NULL, ...) ## S4 method for signature 'DoltResult' dbClearResult(res, ...) ## S4 method for signature 'DoltConnection' dbDisconnect(conn, ...)
conn |
an DoltConnection object. |
statement |
a character vector of length one specifying the SQL statement that should be executed. Only a single SQL statement should be provided. |
params |
A list of query parameters to be substituted into a parameterized query. |
... |
Unused. Needed for compatibility with generic.#' @export |
res |
A DoltResult object. |
This method uses dbx::dbxInsert()
as that implementation
is much more performant than the standard method from RMariaDB::dbWriteTable(),
due to the way Dolt handles repeat INSERT
statements.
## S4 method for signature 'DoltConnection,character,data.frame' dbWriteTable( conn, name, value, field.types = NULL, row.names = FALSE, overwrite = FALSE, append = FALSE, temporary = FALSE, batch_size = NULL )
## S4 method for signature 'DoltConnection,character,data.frame' dbWriteTable( conn, name, value, field.types = NULL, row.names = FALSE, overwrite = FALSE, append = FALSE, temporary = FALSE, batch_size = NULL )
conn |
a database connection |
name |
the table name |
value |
A data frame. |
field.types |
Optional, overrides default choices of field types, derived from the classes of the columns in the data frame. See dbDataType() |
row.names |
Either If A string is equivalent to For backward compatibility, |
overwrite |
a logical specifying whether to overwrite an existing table
or not. Its default is |
append |
a logical specifying whether to append to an existing table
in the database If appending, then the table (or temporary table)
must exist, otherwise an error is reported. Its default is |
temporary |
If |
batch_size |
The number of records to insert in a single statement (defaults to all) |
This the dependency on dbx
may be removed if the base issue is resolved: https://github.com/dolthub/dolt/issues/2091.
dolt-read
dolt()
returns a connection to a default database. It is a convenience
wrapper around dbConnect(dolt_local/remote(), ...
that also caches connections
for faster loading.
dolt( dir = Sys.getenv("DOLT_DIR", "doltdb"), dbname = NULL, username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), port = Sys.getenv("DOLT_PORT", 3306L), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), cache_connection = TRUE, ... )
dolt( dir = Sys.getenv("DOLT_DIR", "doltdb"), dbname = NULL, username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), port = Sys.getenv("DOLT_PORT", 3306L), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), cache_connection = TRUE, ... )
dir |
The directory from which to server a |
dbname |
for remote connections, the database name |
username |
The username. Defaults to "root" |
password |
The login password. Defaults to empty. |
port |
The TCP port for connections. Defaults to 3306. |
host |
The IP of the host. Defaults to the local machine, |
cache_connection |
Should we preserve a cache of the connection? allows faster load times and prevents connection from being garbage-collected. |
... |
further arguments passed to |
Other connections:
dolt_local()
,
dolt_remote()
Add, commit, and reset tables in a dolt database
dolt_add(tables = NULL, conn = dolt(), collect = NULL, show_sql = NULL) dolt_commit( all = TRUE, message = NULL, author = NULL, date = NULL, allow_empty = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_reset( hard = FALSE, tables = NULL, conn = dolt(), collect = NULL, show_sql = NULL )
dolt_add(tables = NULL, conn = dolt(), collect = NULL, show_sql = NULL) dolt_commit( all = TRUE, message = NULL, author = NULL, date = NULL, allow_empty = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_reset( hard = FALSE, tables = NULL, conn = dolt(), collect = NULL, show_sql = NULL )
tables |
Which tables to be reset? Defaults to all tables if NULL. |
conn |
the database connection |
collect |
whether to collect the result into R or return a |
show_sql |
Whether to print the SQL statements used internally to fetch
the data. Useful for learning how Dolt works internally. Defaults to |
all |
stage all tables before committing? |
message |
A commit message. If NULL in an interactive session, the user will be prompted. Otherwise will error if empty. |
author , date
|
Author and date. If null, uses the ones set in dolt-config.
Author should be in the format |
allow_empty |
Allow recording a commit that has the exact same data as its sole parent. This is usually a mistake, so it is FALSE by default. |
hard |
Reset working and staged tables? If FALSE (default), a "soft" reset will be performed, only unstaging staged tables. If TRUE, all working and staged changes will be discarded. |
These functions query the dolt database for system tables that describe the database version history and structure.
dolt_branches(conn = dolt(), collect = NULL, show_sql = NULL) dolt_remotes(conn = dolt(), collect = NULL, show_sql = NULL) dolt_docs(conn = dolt(), collect = NULL, show_sql = NULL) dolt_log(conn = dolt(), collect = NULL, show_sql = NULL)
dolt_branches(conn = dolt(), collect = NULL, show_sql = NULL) dolt_remotes(conn = dolt(), collect = NULL, show_sql = NULL) dolt_docs(conn = dolt(), collect = NULL, show_sql = NULL) dolt_log(conn = dolt(), collect = NULL, show_sql = NULL)
conn |
the database connection |
collect |
whether to collect the result into R or return a |
show_sql |
Whether to print the SQL statements used internally to fetch
the data. Useful for learning how Dolt works internally. Defaults to |
Examine information about dolt tables and diffs
dolt_diffs(table, to, from, conn = dolt(), collect = NULL, show_sql = NULL) dolt_table_history(table, conn = dolt(), collect = NULL, show_sql = NULL)
dolt_diffs(table, to, from, conn = dolt(), collect = NULL, show_sql = NULL) dolt_table_history(table, conn = dolt(), collect = NULL, show_sql = NULL)
table |
character the name of a table in the database |
to |
commit to compare to |
from |
commit to compare from |
conn |
the database connection |
collect |
whether to collect the result into R or return a |
show_sql |
Whether to print the SQL statements used internally to fetch
the data. Useful for learning how Dolt works internally. Defaults to |
Export data from a dolt database
dolt_dump( format = c("sql", "csv", "json", "parquet"), out = NULL, overwrite = FALSE, dir = Sys.getenv("DOLT_DIR", "doltdb") )
dolt_dump( format = c("sql", "csv", "json", "parquet"), out = NULL, overwrite = FALSE, dir = Sys.getenv("DOLT_DIR", "doltdb") )
format |
the export data format. One of |
out |
the location on-disk for export. In the case of |
overwrite |
whether to overwrite existing files/directories. |
dir |
path to dolt database on-disk |
the path(s) of exported files
Initiate a dolt database directory
dolt_init(dir = Sys.getenv("DOLT_DIR", "doltdb"))
dolt_init(dir = Sys.getenv("DOLT_DIR", "doltdb"))
dir |
path to the directory. Will be created if it does not exist |
dolt_local()
creates a DoltLocalDriver
, which can generate
a DoltLocalConnection
. Unlike dolt_remote()
and DoltDriver
, local
connections are for dolt databases stored in directories on-disk, and take
a directory name as an argument. The local connection type starts and manages
a dolt SQL server in the background serving that directory,
connects to it and returns the connection. Parameters govern both the server
and connection
Local dolt connection objects contain additional slots including the
database path on-disk and an external pointer to the server process, and
these are returned via dbGetInfo
and displayed in the connection print
method. The dbDisconnect
method kills the background server if no other
processes are connected to it.
Multi-user or other, more complicated networking set-ups should
use dolt_server()
and dolt_remote()
directly.
dolt_local() ## S4 method for signature 'DoltLocalDriver' dbUnloadDriver(drv, ...) ## S4 method for signature 'DoltLocalDriver' show(object) ## S4 method for signature 'DoltLocalDriver' dbConnect( drv, dir = Sys.getenv("DOLT_DIR", "doltdb"), username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), port = Sys.getenv("DOLT_PORT", 3306L), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), find_port = TRUE, find_server = TRUE, autocommit = TRUE, server_args = list(), ... ) ## S4 method for signature 'DoltLocalConnection' dbGetInfo(dbObj, ...) ## S4 method for signature 'DoltLocalConnection' show(object) ## S4 method for signature 'DoltLocalConnection' dbDisconnect(conn, ...) ## S4 method for signature 'DoltLocalConnection' dbIsValid(dbObj, ...)
dolt_local() ## S4 method for signature 'DoltLocalDriver' dbUnloadDriver(drv, ...) ## S4 method for signature 'DoltLocalDriver' show(object) ## S4 method for signature 'DoltLocalDriver' dbConnect( drv, dir = Sys.getenv("DOLT_DIR", "doltdb"), username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), port = Sys.getenv("DOLT_PORT", 3306L), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), find_port = TRUE, find_server = TRUE, autocommit = TRUE, server_args = list(), ... ) ## S4 method for signature 'DoltLocalConnection' dbGetInfo(dbObj, ...) ## S4 method for signature 'DoltLocalConnection' show(object) ## S4 method for signature 'DoltLocalConnection' dbDisconnect(conn, ...) ## S4 method for signature 'DoltLocalConnection' dbIsValid(dbObj, ...)
drv |
an object of class |
... |
additional arguments to pass to |
object |
a connection object |
dir |
The dolt directory to serve and connect to |
username |
The username. Defaults to "root" |
password |
The login password. Defaults to empty. |
port |
The TCP port for connections. Defaults to 3306. |
host |
The IP of the host. Defaults to the local machine, |
find_port |
whether to find an open port if the default is used by another process |
find_server |
whether to look for another server process serving the same directory before creating a new one |
autocommit |
Whether to autocommit changes in the SQL sense. That is, to flush pending changes to disk and update the working set. |
server_args |
a list of additional arguments to pass to |
dbObj |
the database connection |
conn |
the database connection |
Other connections:
dolt_remote()
,
dolt()
This function launches the RStudio "Connection" pane to interactively explore the database. The pane will show the database versioning state, tables stored in the database, and dolt system tables showing history.
dolt_pane(conn = dolt()) update_dolt_pane(conn = dolt()) close_dolt_pane(conn = dolt())
dolt_pane(conn = dolt()) update_dolt_pane(conn = dolt()) close_dolt_pane(conn = dolt())
conn |
a dolt connection. If a path is provided instead, a connection
will be created to the path using |
When running dolt interactively, the connection pane will automatically
update in response to most queries that modify the database state. You
can stop this behavior by setting the DOLT_WATCH
environment variable
to 0
or false
. See dolt_vars for more configuration variables
The connection object (invisibly)
Work with dolt repository remotes
dolt_push( remote = NULL, remote_branch = NULL, ref = NULL, set_upstream = FALSE, force = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_pull( remote = NULL, squash = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_fetch( remote = NULL, ref = FALSE, force = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_clone( remote_url, remote = "origin", new_dir = basename(remote_url), branch = NULL )
dolt_push( remote = NULL, remote_branch = NULL, ref = NULL, set_upstream = FALSE, force = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_pull( remote = NULL, squash = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_fetch( remote = NULL, ref = FALSE, force = FALSE, conn = dolt(), collect = NULL, show_sql = NULL ) dolt_clone( remote_url, remote = "origin", new_dir = basename(remote_url), branch = NULL )
remote |
the name of the remote. "origin" is used by default |
remote_branch |
the name of the remote branch to use with set_upstream. Current local branch is used by default |
ref |
the branch reference |
set_upstream |
whether to set the remote branch reference to track |
force |
whether to overwrite any conflicting history the current branch |
conn |
the database connection |
collect |
whether to collect the result into R or return a |
show_sql |
Whether to print the SQL statements used internally to fetch
the data. Useful for learning how Dolt works internally. Defaults to |
squash |
whether to merge changes to the working set without updating the commit history |
remote_url |
the remote URL to clone |
new_dir |
the directory to clone into |
branch |
the branch to clone. If NULL, clones all branches |
dolt_remote()
is a DBI Driver to connect to a remote dolt
server via
a port. It, DoltDriver
,
and DoltConnection
class are wrappers around the around classes and methods
from the RMariaDB
package.
Most parameters can be specified with package configuration environment variables.
dolt_remote() ## S4 method for signature 'DoltDriver' dbUnloadDriver(drv, ...) ## S4 method for signature 'DoltDriver' show(object) ## S4 method for signature 'DoltDriver' dbConnect( drv = dolt_remote(), dbname = Sys.getenv("DOLT_DIR", "doltdb"), username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), port = Sys.getenv("DOLT_PORT", 3306L), autocommit = TRUE, ... )
dolt_remote() ## S4 method for signature 'DoltDriver' dbUnloadDriver(drv, ...) ## S4 method for signature 'DoltDriver' show(object) ## S4 method for signature 'DoltDriver' dbConnect( drv = dolt_remote(), dbname = Sys.getenv("DOLT_DIR", "doltdb"), username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), port = Sys.getenv("DOLT_PORT", 3306L), autocommit = TRUE, ... )
drv |
an object of class |
... |
other arguments passed to RMariaDB::MariaDB |
object |
a connection object |
dbname |
The database name |
username |
The username. Defaults to "root" |
password |
The login password. Defaults to empty. |
host |
The IP of the host. Defaults to the local machine, |
port |
The TCP port for connections. Defaults to 3306. |
autocommit |
Whether to autocommit changes in the SQL sense. That is, to flush pending changes to disk and update the working set. |
Most methods fall back to those for RMariaDB
.
Other connections:
dolt_local()
,
dolt()
Start up a dolt SQL server and return the server process handle
dolt_server( dir = Sys.getenv("DOLT_DIR", "doltdb"), username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), port = Sys.getenv("DOLT_PORT", 3306L), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), find_port = TRUE, find_server = TRUE, multi_db = FALSE, autocommit = TRUE, read_only = FALSE, log_level = "info", log_out = NULL, timeout = 28800000, query_parallelism = 2, max_connections = 100, config_file = Sys.getenv("DOLT_CONFIG_FILE", "") )
dolt_server( dir = Sys.getenv("DOLT_DIR", "doltdb"), username = Sys.getenv("DOLT_USERNAME", "root"), password = Sys.getenv("DOLT_PASSWORD", ""), port = Sys.getenv("DOLT_PORT", 3306L), host = Sys.getenv("DOLT_HOST", "127.0.0.1"), find_port = TRUE, find_server = TRUE, multi_db = FALSE, autocommit = TRUE, read_only = FALSE, log_level = "info", log_out = NULL, timeout = 28800000, query_parallelism = 2, max_connections = 100, config_file = Sys.getenv("DOLT_CONFIG_FILE", "") )
dir |
The dolt directory to serve |
username |
The username. Defaults to "root" |
password |
The login password. Defaults to empty. |
port |
The TCP port for connections. Defaults to 3306. |
host |
The IP of the host. Defaults to the local machine, |
find_port |
if TRUE, switch to a different port if |
find_server |
if TRUE, find a server process serving the same directory rather than starting a new one. Note that other server options will be ignored. This allows the server to be used across R sessions. Note that to make best use of this you may want to turn off the "Quit child processes on exit" option in RStudio project options. |
multi_db |
Serve multiple databases? If |
autocommit |
Automatically commit database changes to the working set?
If |
read_only |
should the database only allow read_only connections? |
log_level |
Defines the level of logging provided. Options are "trace", debug", "info", "warning", "error", and "fatal" (default "info"). |
log_out |
Where logging output should be directed. If |
timeout |
Defines the timeout, in seconds, used for connections
(default |
query_parallelism |
Set the number of go routines spawned to handle each
query (default |
max_connections |
Set the number of connections handled by the server
(default |
config_file |
The path to a YAML config file to set these and additional server configuration values. See options in the dolt documentation. |
A dolt_server
object that is also a ps::ps_handle()
These functions yield information about the current state of a dolt database.
dolt_state()
provides information on current branch or headless commit.
dolt_status()
summarizes changes to the database in working or staged tables (from the dolt_status
table).
dolt_last_commit()
pulls the most recent value from the dolt_log
table. All
have pretty-print methods for the objects returned but can be interrogated for more detail.
dolt_state(conn = dolt()) dolt_status(conn = dolt()) dolt_last_commit(conn = dolt())
dolt_state(conn = dolt()) dolt_status(conn = dolt()) dolt_last_commit(conn = dolt())
conn |
the database connection |
Values from each of these functions are returned as part of the dbGetInfo()
method and are part of the information shown in the DoltConnection
print
method and in the RStudio Connection pane for a Dolt Database.
A data frame of class "dolt_status" and tibble::tbl_df. It pretty-prints as an abbreviated summary of status.
Get and set Dolt configuration variables
dolt_config_get( params = NULL, global = TRUE, local_dir = Sys.getenv("DOLT_DIR") ) dolt_config_set(params, global = TRUE, local_dir = Sys.getenv("DOLT_DIR"))
dolt_config_get( params = NULL, global = TRUE, local_dir = Sys.getenv("DOLT_DIR") ) dolt_config_set(params, global = TRUE, local_dir = Sys.getenv("DOLT_DIR"))
params |
What parameters to get or set. Can include |
global |
Set global or database-specific credentials |
local_dir |
if not |
dolt_vars
These methods are extensions of standard DBI functions such as DBI::dbReadTable.
They differ in that they can take an as_of
argument, reading historical data
from the database that was written as of a certain date or commit hash, or
from a different branch.
## S4 method for signature 'DoltConnection,character' dbReadTable( conn, name, as_of = NULL, ..., row.names = FALSE, check.names = TRUE ) ## S4 method for signature 'DoltConnection' dbListTables(conn, as_of = NULL, ...) ## S4 method for signature 'DoltConnection' dbListObjects(conn, prefix = NULL, as_of = NULL, ...) ## S4 method for signature 'DoltConnection,character' dbExistsTable(conn, name, as_of = NULL, ...)
## S4 method for signature 'DoltConnection,character' dbReadTable( conn, name, as_of = NULL, ..., row.names = FALSE, check.names = TRUE ) ## S4 method for signature 'DoltConnection' dbListTables(conn, as_of = NULL, ...) ## S4 method for signature 'DoltConnection' dbListObjects(conn, prefix = NULL, as_of = NULL, ...) ## S4 method for signature 'DoltConnection,character' dbExistsTable(conn, name, as_of = NULL, ...)
conn |
a dolt connection object, produced by
|
name |
a character string specifying a table name. |
as_of |
A dolt commit hash, branch name, or object coercible to POSIXct |
... |
Unused, needed for compatibility with generic. |
row.names |
Either If A string is equivalent to For backward compatibility, |
check.names |
If |
prefix |
A fully qualified path in the database's namespace, or |
A data.frame in the case of dbReadTable()
; a character vector of
names for dbListTables()
and dbListObjects()
, and a logical result for
dbExistsTable()
.
Querying Historical Data with AS OF Queries on the DoltHub blog, and RMariaDB methods upon which these are built.
The doltr package's behavior can be modified by setting these environment variables:
DOLT_DIR
set the default directory to look for
a dolt database and run a server when using dolt_local()
and dolt()
.
Defaults to "doltdb".
DOLT_PORT
sets the port to connect to or to run the server on. Defaults
to 3306.
DOLT_HOST
sets the host IP to connect to or to run the server
on. Defaults to 127.0.0.1.
DOLT_CONFIG_FILE
is the path to a
file with additional configuration options for the dolt sql server. See
https://docs.dolthub.com/interfaces/cli#dolt-sql-server for options.
DOLT_PATH
specifies the path to the dolt binary if running locally.
Defaults to the one found in the system path.
DOLT_COLLECT
specifies whether dolt
convenience functions returning data should return fully collected
tibbles or lazy tibbles for further processing. Set it to 0 or false
to disable, potentially for when large databases with long histories yield
very large responses to commands like dolt_log()
or dolt_diffs()
.
DOLT_VERBOSE
will print the SQL or command-line statements executed when
running functions that wrap database or system calls. Useful for
understanding how dolt commands work. Set to 1 or true
to enable this
behavior.
DOLT_WATCH
determines whether the RStudio Connection pane automatically
updates in response to changes in the database. Set it to 0 or false
to disable this behavior.
DOLT_ROOT_DIR
the directory where Dolt global configuration and credential
data is stored (~/.dolt
by default). Note this can also be set in your
shell to configure command-line dolt.
dolt-config
Find and check for the presence of a dolt binary
is_dolt_installed() dolt_version() dolt_path()
is_dolt_installed() dolt_version() dolt_path()