Dynamic Catalog

This section introduces the dynamic catalog feature of openLooKeng. Normally openLooKeng admins add data source to the engine by putting a catalog profile (e.g. hive.properties) in the connector directory (etc/catalog). Whenever there is a requirement to add, update or delete a catalog, all the coordinators and workers need to be restarted.

In order to dynamically change the catalogs on-the-fly, openLooKeng introduced dynamic catalog feature. To enable this feature.

  • First, configure it in the etc/config.properties:

    catalog.dynamic-enabled=true
    
  • Secondly, configure the filesystems used to store dynamic catalog information in hdfs-config-default.properties. You can change this name of file by catalog.share.filesystem.profile property in etc/node.properties, default value is hdfs-config-default. Check the filesystem doc for more information.

    Add a hdfs-config-default.properties file in the etc/filesystem/ directory, if this directory does not exist, please create it.

    fs.client.type=hdfs
    hdfs.config.resources=/opt/openlookeng/config/core-site.xml, /opt/openlookeng/config/hdfs-site.xml
    hdfs.authentication.type=NONE
    fs.hdfs.impl.disable.cache=true
    

    If HDFS enable the Kerberos, then

    fs.client.type=hdfs
    hdfs.config.resources=/opt/openlookeng/config/core-site.xml, /opt/openlookeng/config/hdfs-site.xml
    hdfs.authentication.type=KERBEROS
    hdfs.krb5.conf.path=/opt/openlookeng/config/krb5.conf
    hdfs.krb5.keytab.path=/opt/openlookeng/config/user.keytab
    hdfs.krb5.principal=openlookeng@HADOOP.COM # replace openlookeng@HADOOP.COM to your principal 
    fs.hdfs.impl.disable.cache=true
    
  • Finally, configure the paths of filesystems in etc/node.properties.

    catalog.config-dir=/opt/openlookeng/catalog
    catalog.share.config-dir=/opt/openkeng/catalog/share
    

Usage

The catalog operations are done through a RESTful API on the openLooKeng coordinator. A http request has the following shape (hive connector as an example), the form of POST/PUT body is multipart/form-data:

request: POST/DELETE/PUT

header: `X-Presto-User: admin`

form: 'catalogInformation={
        "catalogName" : "hive",
        "connectorName" : "hive-hadoop2",
        "properties" : {
              "hive.hdfs.impersonation.enabled" : "false",
              "hive.hdfs.authentication.type" : "KERBEROS",
              "hive.collect-column-statistics-on-write" : "true",
              "hive.metastore.service.principal" : "hive/hadoop.hadoop.com@HADOOP.COM",
              "hive.metastore.authentication.type" : "KERBEROS",
              "hive.metastore.uri" : "thrift://xx.xx.xx.xx:21088",
              "hive.allow-drop-table" : "true",
              "hive.config.resources" : "core-site.xml,hdfs-site.xml",
              "hive.hdfs.presto.keytab" : "user.keytab",
              "hive.metastore.krb5.conf.path" : "krb5.conf",
              "hive.metastore.client.keytab" : "user.keytab",
              "hive.metastore.client.principal" : "test@HADOOP.COM",
              "hive.hdfs.wire-encryption.enabled" : "true",
              "hive.hdfs.presto.principal" : "test@HADOOP.COM"
              }
          }',
          'catalogConfigurationFiles=path/to/core-site.xml',
          'catalogConfigurationFiles=path/to/hdfs-site.xml',
          'catalogConfigurationFiles=path/to/user.keytab',
          'globalConfigurationFiles=path/to/krb5.conf'

Add Catalog

When a new catalog is added, a POST request is sent to a coordinator. The coordinator first rewrite the file path properties, saving the files to local disk and verify the operation by loading the newly added catalog. If the catalog is successfully loaded, the coordinator saves the files to the shared file system (e.g. HDFS).

Other coordinators and workers periodically check the catalog properties file in the shared filesystem. When a new catalog is discovered, they pull the related config files to the local disk and then load the catalog into the memory.

Delete catalog

Similar to adding operation, when a catalog needs to be deleted, send a DELETE request to a coordinator. The coordinator that received the request deletes the related catalog profiles from the local disk, unloads the catalog from server and deletes it from the shared file system.

Other coordinators and workers periodically check the catalog properties file in the shared filesystem. When a catalog is deleted, they also delete the related config files from the local disk and then unload the catalog from the memory.

Update catalog

An UPDATE operation is a combination of DELETE and ADD operations. First the admin sends a PUT request to a coordinator. On receipt the coordinator deletes and add the catalog locally to verify the change. If this operation is successful, this coordinator delete the catalog from the shared file system and WAIT UNTIL all other nodes to delete the catalog from their local filesystem. After it saves the new configuration files to the shared file system.

Other coordinators and workers periodically check the catalog properties file in the shared filesystem and perform changes accordingly on the local file system.

Catalog properties including connector-name and properties can be modified. However, the catalog name CAN NOT be changed.

API information

HTTP request

Add: POST host/v1/catalog

Update: PUT host/v1/catalog

Delete: DELETE host/v1/catalog/{catalogName}

HTTP Return code

HTTP Return codePOSTPUTDELETE
401 UNAUTHORIZEDNo permission to add a catalogNo permission to change a catalogSame as PUT
302 FOUNDThe catalog already exists--
404 NOT_FOUNDDynamic catalog is disabledThe catalog does not exist or dynamic catalog is disabledSame as PUT
400 BAD_REQUESTThe request is not correctSame as POSTSame as PUT
409 CONFLICTAnother session is operating the catalogSame as POSTSame as POST
500 INTERNAL_SERVER_ERRORInternal error occurred in the coordinatorSame as POSTSame as POST
201 CREATEDSUCCESSSUCCESS-
204 NO_CONTENT--SUCCESS

Configuration properties

In etc/config.properties:

Property NameMandatoryDescriptionDefault Value
catalog.dynamic-enabledNOWhether to enable dynamic catalogfalse
catalog.scanner-intervalNOInterval for scanning catalogs in the shared file system.5s
catalog.max-file-sizeNOMaximum catalog file size128k
catalog.valid-file-suffixesNOThe valid suffixes of catalog config file, if there are several suffixes, separated by commas. Allow all file suffixes when it is empty

In etc/node.properties:

Path white list:["/tmp”, “/opt/hetu”, “/opt/openlookeng”, “/etc/hetu”, “/etc/openlookeng”, current workspace]

Notice:avoid to choose root directory; ../ can’t include in path; if you config node.date_dir, then the current workspace is the parent of node.data_dir; otherwise, the current workspace is the openlookeng server’s directory.

Property NameMandatoryDescriptionDefault Value
catalog.config-dirYESRoot directory for storing configuration files in local disk.
catalog.share.config-dirNORoot directory for storing configuration files in the shared file system.
catalog.share.filesystem.profileNOThe profile name of the shared file system.hdfs-config-default

Impact on queries

  • After a catalog is deleted, queries that are being executed may fail.
  • Queries in progress may fail when the catalog is being updated.