Apache Atlas- Quick start (part I — REST & UI)
The article aims to show base steps to work with Apache Atlas. Here will be considered the next points:
1. Local installation for development;
2. Data model overview;
3. How to add\delete\update types by REST API;
4. How to add\delete\update entities by UI
5. References
The second article where I describe how to work with Java API is located here.
My environment
OS: Mac OS X, 10.15.6
Java: build 1.8.0_202-b08
Atlas: 2.1.0
Python 2.7.9
1. Local installation for development
Here you can find the whole description of how to build & install.
- Download the necessary version from this or from GitHub
- Unarchive & build (it will take a few minutes)
tar xvfz apache-atlas-2.1.0-sources.tar.gz
cd apache-atlas-sources-2.1.0/
export MAVEN_OPTS="-Xms2g -Xmx2g"
mvn clean -DskipTests package -Pdist,embedded-hbase-solr
cd ./distro/target/apache-atlas-2.1.0-bin/apache-atlas-2.1.0/
- check env variables at: conf/atlas-env.sh and add/edit/uncomment them if needed:
export JAVA_HOME=YOUR_PATH_TO_JAVA_HOMEexport ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m" export MANAGE_LOCAL_HBASE=true export MANAGE_LOCAL_SOLR=trueexport HBASE_CONF_DIR=apache-atlas-sources-2.1.0/distro/target/apache-atlas-2.1.0-bin/apache-atlas-2.1.0/conf/hbase
- make .conf/atlas-env.sh executable & run it:
chmod +x .conf/atlas-env.sh
.conf/atlas-env.sh
- run & check atlas server:
./bin/atlas_start.py [-port <port>] (by default 21000)
after as result you should see:
configured for local hbase.
hbase started.
configured for local solr.
solr started.
setting up solr collections…
starting atlas on host localhost
starting atlas on port 21000
………………………………
Apache Atlas Server started!!!
wait a few seconds before check:
curl -u admin:admin http://localhost:21000/api/atlas/admin/version
{“Description”:”Metadata Management and Data Governance Platform over Hadoop”,”Revision”:”release”,”Version”:”2.1.0",”Name”:”apache-atlas”}%
also you can open web-UI: http://localhost:21000/login.jsp
to stop the server — use command:
./bin/atlas_stop.py
2. Data model overview
Base information about data model you can find here.
- Type — in Atlas is a definition of how a particular type of metadata objects are stored and accessed. A type represents one or a collection of attributes that define the properties for the metadata object. Users with a development background will recognize the similarity of a type to a ‘Class’ definition of object-oriented programming languages, or a ‘table schema’ of relational databases.
- Entity — in Atlas is a specific value or instance of an Entity ‘type’ and thus represents a specific metadata object in the real world. Referring back to our analogy of Object-Oriented Programming languages, an ‘instance’ is an ‘Object’ of a certain ‘Class’.
There are two main types: DataSet & Process.
- DataSet — this type extends Referenceable. Conceptually, it can be used to represent an type that stores data. In Atlas, hive tables, hbase_tables etc are all types that extend from DataSet. Types that extend DataSet can be expected to have a Schema in the sense that they would have an attribute that defines attributes of that dataset. For e.g. the columns attribute in a hive_table. Also entities of types that extend DataSet participate in data transformation and this transformation can be captured by Atlas via lineage (or provenance) graphs.
- Process — this type extends Asset. Conceptually, it can be used to represent any data transformation operation. For example, an ETL process that transforms a hive table with raw data to another hive table that stores some aggregate can be a specific type that extends the Process type. A Process type has two specific attributes, inputs and outputs. Both inputs and outputs are arrays of DataSet entities. Thus an instance of a Process type can use these inputs and outputs to capture how the lineage of a DataSet evolves.
You can build your own types and depends on their behaviour you have to extend DataSet or Process. A lot of types already exist in Atlas, also some additional type definition you can find here or:
cd ./models/
0000-Area0
1000-Hadoop
2000-RDBMS
3000-Cloud
4000-MachineLearning
3. How to add\delete\update types by REST API
Documentation for REST API you can find by link. For working with REST API I use Postman, curl also works.
- Get all types definition:
Request Type: GET
URL: http://localhost:21000/api/atlas/v2/types/typedefs
Authorization Type: Basic Auth
Username/Password: admin/admin - Get definition for a particular type (for instance DataSet):
Request Type: GET
URL:http://localhost:21000/api/atlas/v2/types/entitydef/name/DataSet
Authorization Type: Basic Auth
Username/Password: admin/admin - Create\update types:
Request Type: POST
URL:http://localhost:21000/api/atlas/v2/types/typedefs
Authorization Type: Basic Auth
Username/Password: admin/admin
Headers: Content-Type = application/json
Body: json type definition - Delete types:
Request Type: DELETE
URL:http://localhost:21000/api/atlas/v2/types/typedefs/name/type_name
Authorization Type: Basic Auth
Username/Password: admin/admin
Before we continue, please check if all of those types exist in your Atlas:
- spark_process;
- aws_s3_bucket;
- aws_s3_pseudo_dir;
- aws_s3_object;
- hive_table.
If don’t then use models from this link, URL example for checking: http://localhost:21000/api/atlas/v2/types/entitydef/name/spark_process
Let’s create our custom type (custom_type), it will extend DataSet and have additional fields: custom_name and lastAccessTime. Type definition you can find here. Creation parameters:
- Request Type: POST
URL:http://localhost:21000/api/atlas/v2/types/typedefs
Authorization Type: Basic Auth
Username/Password: admin/admin
Headers: Content-Type = application/json
Body: json type definition
As a result you will get whole type definition with GUID.
Let’s delete our custom type:
- Request Type: DELETE
URL:http://localhost:21000/api/atlas/v2/types/typedef/name/custom_type
Authorization Type: Basic Auth
Username/Password: admin/admin
4. How to add\delete\update entities by UI
First of all, at server properties, we have to allow creation entities from UI. We can do it by modifying conf/atlas-application.properties file, add the next parameter:
atlas.ui.editable.entity.types=*
or, we can specify some certain types:
atlas.ui.editable.entity.types=type1,type2,type3,...
then we have to stop & run our server again:
./bin/atlas_stop.py
./bin/atlas_start.py
let’s open UI: http://localhost:21000/, and then we have to see create new entity link
if we add atlas.ui.editable.entity.types=* then we will see a whole list of entities which we can create from UI (if you see just hdfs_path, maybe you made changes in wrong conf/atlas-application.properties file)
5. References
Apache Atlas project page
Apache Atlas GitHub
What is hard & soft deletion
Apache Atlas — Using the v2 Rest API