Apache Atlas- Quick start (part I — REST & UI)

Alexey Artemov
5 min readOct 6, 2020

The article aims to show base steps to work with Apache Atlas. Here will be considered the next points:

1. Local installation for development;
2. Data model overview;
3. How to add\delete\update types by REST API;
4. How to add\delete\update entities by UI
5. References

The second article where I describe how to work with Java API is located here.

My environment

OS: Mac OS X, 10.15.6
Java: build 1.8.0_202-b08
Atlas: 2.1.0
Python 2.7.9

1. Local installation for development

Here you can find the whole description of how to build & install.

  • Download the necessary version from this or from GitHub
  • Unarchive & build (it will take a few minutes)
tar xvfz  apache-atlas-2.1.0-sources.tar.gz
cd apache-atlas-sources-2.1.0/
export MAVEN_OPTS="-Xms2g -Xmx2g"
mvn clean -DskipTests package -Pdist,embedded-hbase-solr
cd ./distro/target/apache-atlas-2.1.0-bin/apache-atlas-2.1.0/
  • check env variables at: conf/atlas-env.sh and add/edit/uncomment them if needed:
export JAVA_HOME=YOUR_PATH_TO_JAVA_HOMEexport ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m" export MANAGE_LOCAL_HBASE=true export MANAGE_LOCAL_SOLR=trueexport HBASE_CONF_DIR=apache-atlas-sources-2.1.0/distro/target/apache-atlas-2.1.0-bin/apache-atlas-2.1.0/conf/hbase
  • make .conf/atlas-env.sh executable & run it:
chmod +x .conf/atlas-env.sh
.conf/atlas-env.sh
  • run & check atlas server:
./bin/atlas_start.py [-port <port>] (by default 21000)

after as result you should see:

configured for local hbase.
hbase started.
configured for local solr.
solr started.
setting up solr collections…
starting atlas on host localhost
starting atlas on port 21000
………………………………
Apache Atlas Server started!!!

wait a few seconds before check:

curl -u admin:admin http://localhost:21000/api/atlas/admin/version

{“Description”:”Metadata Management and Data Governance Platform over Hadoop”,”Revision”:”release”,”Version”:”2.1.0",”Name”:”apache-atlas”}%

also you can open web-UI: http://localhost:21000/login.jsp

to stop the server — use command:

./bin/atlas_stop.py

2. Data model overview

Base information about data model you can find here.

  • Type — in Atlas is a definition of how a particular type of metadata objects are stored and accessed. A type represents one or a collection of attributes that define the properties for the metadata object. Users with a development background will recognize the similarity of a type to a ‘Class’ definition of object-oriented programming languages, or a ‘table schema’ of relational databases.
  • Entity — in Atlas is a specific value or instance of an Entity ‘type’ and thus represents a specific metadata object in the real world. Referring back to our analogy of Object-Oriented Programming languages, an ‘instance’ is an ‘Object’ of a certain ‘Class’.

There are two main types: DataSet & Process.

  • DataSet — this type extends Referenceable. Conceptually, it can be used to represent an type that stores data. In Atlas, hive tables, hbase_tables etc are all types that extend from DataSet. Types that extend DataSet can be expected to have a Schema in the sense that they would have an attribute that defines attributes of that dataset. For e.g. the columns attribute in a hive_table. Also entities of types that extend DataSet participate in data transformation and this transformation can be captured by Atlas via lineage (or provenance) graphs.
  • Process — this type extends Asset. Conceptually, it can be used to represent any data transformation operation. For example, an ETL process that transforms a hive table with raw data to another hive table that stores some aggregate can be a specific type that extends the Process type. A Process type has two specific attributes, inputs and outputs. Both inputs and outputs are arrays of DataSet entities. Thus an instance of a Process type can use these inputs and outputs to capture how the lineage of a DataSet evolves.

You can build your own types and depends on their behaviour you have to extend DataSet or Process. A lot of types already exist in Atlas, also some additional type definition you can find here or:

cd ./models/

0000-Area0
1000-Hadoop
2000-RDBMS
3000-Cloud
4000-MachineLearning

3. How to add\delete\update types by REST API

Documentation for REST API you can find by link. For working with REST API I use Postman, curl also works.

Before we continue, please check if all of those types exist in your Atlas:

  • spark_process;
  • aws_s3_bucket;
  • aws_s3_pseudo_dir;
  • aws_s3_object;
  • hive_table.

If don’t then use models from this link, URL example for checking: http://localhost:21000/api/atlas/v2/types/entitydef/name/spark_process

Let’s create our custom type (custom_type), it will extend DataSet and have additional fields: custom_name and lastAccessTime. Type definition you can find here. Creation parameters:

As a result you will get whole type definition with GUID.

Let’s delete our custom type:

  • Request Type: DELETE
    URL:http://localhost:21000/api/atlas/v2/types/typedef/name/custom_type
    Authorization Type: Basic Auth
    Username/Password: admin/admin

4. How to add\delete\update entities by UI

First of all, at server properties, we have to allow creation entities from UI. We can do it by modifying conf/atlas-application.properties file, add the next parameter:

atlas.ui.editable.entity.types=*

or, we can specify some certain types:

atlas.ui.editable.entity.types=type1,type2,type3,...

then we have to stop & run our server again:

./bin/atlas_stop.py
./bin/atlas_start.py

let’s open UI: http://localhost:21000/, and then we have to see create new entity link

if we add atlas.ui.editable.entity.types=* then we will see a whole list of entities which we can create from UI (if you see just hdfs_path, maybe you made changes in wrong conf/atlas-application.properties file)

5. References

Apache Atlas project page
Apache Atlas GitHub
What is hard & soft deletion
Apache Atlas — Using the v2 Rest API

--

--

Alexey Artemov

Staff Data Engineer | MLOps | Data Architect | AWS | Databricks | Data Governance. https://www.linkedin.com/in/aartemov/