Dynamic Catalog
This section introduces the dynamic catalog feature of openLooKeng. Normally openLooKeng admins add data source to the engine by putting a catalog profile (e.g. hive.properties
) in the connector directory (etc/catalog
). Whenever there is a requirement to add, update or delete a catalog, all the coordinators and workers need to be restarted.
In order to dynamically change the catalogs on-the-fly, openLooKeng introduced dynamic catalog feature. To enable this feature.
First, configure it in the
etc/config.properties
:catalog.dynamic-enabled=true
Secondly, configure the filesystems used to store dynamic catalog information in
hdfs-config-default.properties
. You can change this name of file bycatalog.share.filesystem.profile
property inetc/node.properties
, default value ishdfs-config-default
. Check the filesystem doc for more information.Add a
hdfs-config-default.properties
file in theetc/filesystem/
directory, if this directory does not exist, please create it.fs.client.type=hdfs hdfs.config.resources=/opt/openlookeng/config/core-site.xml, /opt/openlookeng/config/hdfs-site.xml hdfs.authentication.type=NONE fs.hdfs.impl.disable.cache=true
If HDFS enable the Kerberos, then
fs.client.type=hdfs hdfs.config.resources=/opt/openlookeng/config/core-site.xml, /opt/openlookeng/config/hdfs-site.xml hdfs.authentication.type=KERBEROS hdfs.krb5.conf.path=/opt/openlookeng/config/krb5.conf hdfs.krb5.keytab.path=/opt/openlookeng/config/user.keytab hdfs.krb5.principal=openlookeng@HADOOP.COM # replace openlookeng@HADOOP.COM to your principal fs.hdfs.impl.disable.cache=true
Finally, configure the paths of filesystems in
etc/node.properties
.catalog.config-dir=/opt/openlookeng/catalog catalog.share.config-dir=/opt/openkeng/catalog/share
Usage
The catalog operations are done through a RESTful API on the openLooKeng coordinator. A http request has the following shape (hive connector as an example), the form of POST/PUT body is multipart/form-data
:
request: POST/DELETE/PUT
header: `X-Presto-User: admin`
form: 'catalogInformation={
"catalogName" : "hive",
"connectorName" : "hive-hadoop2",
"properties" : {
"hive.hdfs.impersonation.enabled" : "false",
"hive.hdfs.authentication.type" : "KERBEROS",
"hive.collect-column-statistics-on-write" : "true",
"hive.metastore.service.principal" : "hive/hadoop.hadoop.com@HADOOP.COM",
"hive.metastore.authentication.type" : "KERBEROS",
"hive.metastore.uri" : "thrift://xx.xx.xx.xx:21088",
"hive.allow-drop-table" : "true",
"hive.config.resources" : "core-site.xml,hdfs-site.xml",
"hive.hdfs.presto.keytab" : "user.keytab",
"hive.metastore.krb5.conf.path" : "krb5.conf",
"hive.metastore.client.keytab" : "user.keytab",
"hive.metastore.client.principal" : "test@HADOOP.COM",
"hive.hdfs.wire-encryption.enabled" : "true",
"hive.hdfs.presto.principal" : "test@HADOOP.COM"
}
}',
'catalogConfigurationFiles=path/to/core-site.xml',
'catalogConfigurationFiles=path/to/hdfs-site.xml',
'catalogConfigurationFiles=path/to/user.keytab',
'globalConfigurationFiles=path/to/krb5.conf'
Add Catalog
When a new catalog is added, a POST request is sent to a coordinator. The coordinator first rewrite the file path properties, saving the files to local disk and verify the operation by loading the newly added catalog. If the catalog is successfully loaded, the coordinator saves the files to the shared file system (e.g. HDFS).
Other coordinators and workers periodically check the catalog properties file in the shared filesystem. When a new catalog is discovered, they pull the related config files to the local disk and then load the catalog into the memory.
Delete catalog
Similar to adding operation, when a catalog needs to be deleted, send a DELETE request to a coordinator. The coordinator that received the request deletes the related catalog profiles from the local disk, unloads the catalog from server and deletes it from the shared file system.
Other coordinators and workers periodically check the catalog properties file in the shared filesystem. When a catalog is deleted, they also delete the related config files from the local disk and then unload the catalog from the memory.
Update catalog
An UPDATE operation is a combination of DELETE and ADD operations. First the admin sends a PUT request to a coordinator. On receipt the coordinator deletes and add the catalog locally to verify the change. If this operation is successful, this coordinator delete the catalog from the shared file system and WAIT UNTIL all other nodes to delete the catalog from their local filesystem. After it saves the new configuration files to the shared file system.
Other coordinators and workers periodically check the catalog properties file in the shared filesystem and perform changes accordingly on the local file system.
Catalog properties including connector-name
and properties
can be modified. However, the catalog name CAN NOT be changed.
API information
HTTP request
Add: POST host/v1/catalog
Update: PUT host/v1/catalog
Delete: DELETE host/v1/catalog/{catalogName}
HTTP Return code
HTTP Return code | POST | PUT | DELETE |
---|---|---|---|
401 UNAUTHORIZED | No permission to add a catalog | No permission to change a catalog | Same as PUT |
302 FOUND | The catalog already exists | - | - |
404 NOT_FOUND | Dynamic catalog is disabled | The catalog does not exist or dynamic catalog is disabled | Same as PUT |
400 BAD_REQUEST | The request is not correct | Same as POST | Same as PUT |
409 CONFLICT | Another session is operating the catalog | Same as POST | Same as POST |
500 INTERNAL_SERVER_ERROR | Internal error occurred in the coordinator | Same as POST | Same as POST |
201 CREATED | SUCCESS | SUCCESS | - |
204 NO_CONTENT | - | - | SUCCESS |
Configuration properties
In etc/config.properties
:
Property Name | Mandatory | Description | Default Value |
---|---|---|---|
catalog.dynamic-enabled | NO | Whether to enable dynamic catalog | false |
catalog.scanner-interval | NO | Interval for scanning catalogs in the shared file system. | 5s |
catalog.max-file-size | NO | Maximum catalog file size | 128k |
catalog.valid-file-suffixes | NO | The valid suffixes of catalog config file, if there are several suffixes, separated by commas. Allow all file suffixes when it is empty |
In etc/node.properties
:
Path white list:["/tmp”, “/opt/hetu”, “/opt/openlookeng”, “/etc/hetu”, “/etc/openlookeng”, current workspace]
Notice:avoid to choose root directory; ../ can’t include in path; if you config node.date_dir, then the current workspace is the parent of node.data_dir; otherwise, the current workspace is the openlookeng server’s directory.
Property Name | Mandatory | Description | Default Value |
---|---|---|---|
catalog.config-dir | YES | Root directory for storing configuration files in local disk. | |
catalog.share.config-dir | NO | Root directory for storing configuration files in the shared file system. | |
catalog.share.filesystem.profile | NO | The profile name of the shared file system. | hdfs-config-default |
Impact on queries
- After a catalog is deleted, queries that are being executed may fail.
- Queries in progress may fail when the catalog is being updated.