please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...

Question

Question

please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...

please write 2 pages of the below topic

(1) Apache Sentry ( authorization) Source: wikipedia and Apache.org
(2) MIT Kerberos (authentication system) (https://web.mit.edu/kerberos/)

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

1)

Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS (limited to Hive table data).

Sentry is designed to be a pluggable authorization engine for Hadoop components. It allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources. Sentry is highly modular and can support authorization for a wide variety of data models in Hadoop.

Architecture Overview

Sentry Components

Data Engine Sentry server Sentry Plugin Policy Metadata

There are three components involved in the authorization process:

Sentry Server
The Sentry RPC server manages the authorization metadata. It supports interfaces to securely retrieve and manipulate the metadata.
Data Engine
This is a data processing application, such as Hive or Impala, that needs to authorize access to data or metadata resources. The data engine loads the Sentry plugin and all client requests for accessing resources are intercepted and routed to the Sentry plugin for validation.
Sentry Plugin
The Sentry plugin runs in the data engine. It offers interfaces to manipulate authorization metadata stored in the Sentry Server, and includes the authorization policy engine that evaluates access requests using the authorization metadata retrieved from the server.

Key Concepts

Authentication - Verifying credentials to reliably identify a user
Authorization - Limiting the user’s access to a given resource
User - Individual identified by underlying authentication system
Group - A set of users, maintained by the authentication system
Privilege - An instruction or rule that allows access to an object
Role - A set of privileges; a template to combine multiple access rules
Authorization models - Defines the objects to be subject to authorization rules and the granularity of actions allowed. For example, in the SQL model, the objects can be databases or tables, and the actions are SELECT, INSERT, and CREATE. For the Search model, the objects are indexes, configs, collections, documents; the access modes include query and update.

User Identity and Group Mapping

Sentry relies on underlying authentication systems, such as Kerberos or LDAP, to identify the user. It also uses the group mapping mechanism configured in Hadoop to ensure that Sentry sees the same group mapping as other components of the Hadoop ecosystem.

Consider a sample organization with users Alice and Bob who belong to an Active Directory (AD) group called finance-department. Bob also belongs to a group called finance-managers. In Sentry, you first create roles and then grant privileges to these roles. For example, you can create a role called Analyst and grant SELECT on tables Customer and Sales to this role.

The next step is to join these authentication entities (users and groups) to authorization entities (roles). This can be done by granting the Analyst role to the finance-department group. Now Bob and Alice who are members of the finance-department group get SELECT privilege to the Customer and Sales tables.

Role-Based Access Control

Role-based access control (RBAC) is a powerful mechanism to manage authorization for a large set of users and data objects in a typical enterprise. New data objects get added or removed, users join, move, or leave organisations all the time. RBAC makes managing this a lot easier. Hence, as an extension of the sample organization discussed previously, if a new employee Carol joins the Finance Department, all you need to do is add her to the finance-department group in AD. This will give Carol access to data from the Sales and Customer tables.

Unified Authorization

Another important aspect of Sentry is the unified authorization. The access control rules once defined, work across multiple data access tools. For example, being granted the Analyst role in the previous example will allow Bob, Alice, and others in the finance-department group to access table data from SQL engines such as Hive and Impala, as well as using MapReduce, Pig applications or metadata access using HCatalog.

Sentry Integration with the Hadoop Ecosystem

As illustrated above, Apache Sentry works with multiple Hadoop components. At the heart, you have the Sentry Server which stores authorization metadata and provides APIs for tools to retrieve and modify this metadata securely.

Note that the Sentry Server only facilitates the metadata. The actual authorization decision is made by a policy engine that runs in data processing applications such as Hive or Impala. Each component loads the Sentry plugin, which includes the service client for dealing with the Sentry service and the policy engine to validate the authorization request.

Hive and Sentry

Consider an example where Hive gets a request to access an object in a certain mode by a client. If Bob submits the following Hive query:

select * from production.sales

Hive will identify that user Bob is requesting SELECT access to the Sales table. At this point Hive will ask the Sentry plugin to validate Bob’s access request. The plugin will retrieve Bob’s privileges related to the Sales table and the policy engine will determine if the request is valid.

Hive works with both the Sentry service and policy files. Cloudera recommends you use the Sentry service, which makes it easier to manage user privileges.

Impala and Sentry

Authorization processing in Impala is similar to that in Hive. The main difference is caching of privileges. Impala’s Catalog server manages caching schema metadata and propagating it to all Impala server nodes. This Catalog server caches Sentry metadata as well. As a result, authorization validation in Impala happens locally and much faster.

Sentry-HDFS Synchronization

Sentry-HDFS authorization is focused on Hive warehouse data - that is, any data that is part of a table in Hive or Impala. The real objective of this integration is to expand the same authorization checks to Hive warehouse data being accessed from any other components such as Pig, MapReduce or Spark. At this point, this feature does not replace HDFS ACLs. Tables that are not associated with Sentry will retain their old ACLs.

The mapping of Sentry privileges to HDFS ACL permissions is as follows:

SELECT privilege -> Read access on the file.
INSERT privilege -> Write access on the file.
ALL privilege -> Read and Write access on the file.

The NameNode loads a Sentry plugin that caches Sentry privileges as well Hive metadata. This helps HDFS to keep file permissions and Hive tables privileges in sync. The Sentry plugin periodically polls Sentry to keep the metadata changes in sync.

For example, if Bob runs a Pig job that is reading from the Sales table data files, Pig will try to get the file handle from HDFS. At that point the Sentry plugin on the NameNode will figure out that the file is part of Hive data and overlay Sentry privileges on top of the file ACLs. As a result, HDFS will enforce the same privileges for this Pig client that Hive would apply for a SQL query.

Search and Sentry

Sentry can apply restrictions to various Search tasks including accessing data and creating collections. These restrictions are consistently applied, regardless of the way users attempt to complete actions. For example, restricting access to data in a collection restricts that access whether queries come from the command line, from a browser, or through the admin console.

With Search, Sentry restrictions can be stored in the database-backed Sentry service or in a policy file (for example, sentry-provider.ini) which is stored in an HDFS location such as hdfs://ha-nn-uri/user/solr/sentry/sentry-provider.ini.

Sentry with Search does not support multiple policy files for multiple databases. If you choose to use policy files rather than database-backed Sentry service, you must use a separate policy file for each Sentry-enabled service. For example, if Hive and Search were using policy file authorization, using a combined Hive and Search policy file would result in an invalid configuration and failed authorization on both services.

2)

What is Kerberos?

Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography. A free implementation of this protocol is available from the Massachusetts Institute of Technology. Kerberos is available in many commercial products as well.

The Internet is an insecure place. Many of the protocols used in the Internet do not provide any security. Tools to "sniff" passwords off of the network are in common use by malicious hackers. Thus, applications which send an unencrypted password over the network are extremely vulnerable. Worse yet, other client/server applications rely on the client program to be "honest" about the identity of the user who is using it. Other applications rely on the client to restrict its activities to those which it is allowed to do, with no other enforcement by the server.

Some sites attempt to use firewalls to solve their network security problems. Unfortunately, firewalls assume that "the bad guys" are on the outside, which is often a very bad assumption. Most of the really damaging incidents of computer crime are carried out by insiders. Firewalls also have a significant disadvantage in that they restrict how your users can use the Internet. (After all, firewalls are simply a less extreme example of the dictum that there is nothing more secure than a computer which is not connected to the network --- and powered off!) In many places, these restrictions are simply unrealistic and unacceptable.

Kerberos was created by MIT as a solution to these network security problems. The Kerberos protocol uses strong cryptography so that a client can prove its identity to a server (and vice versa) across an insecure network connection. After a client and server has used Kerberos to prove their identity, they can also encrypt all of their communications to assure privacy and data integrity as they go about their business.

Kerberos is freely available from MIT, under copyright permissions very similar those used for the BSD operating system and the X Window System. MIT provides Kerberos in source form so that anyone who wishes to use it may look over the code for themselves and assure themselves that the code is trustworthy. In addition, for those who prefer to rely on a professionally supported product, Kerberos is available as a product from many different vendors.

In summary, Kerberos is a solution to your network security problems. It provides the tools of authentication and strong cryptography over the network to help you secure your information systems across your entire enterprise. We hope you find Kerberos as useful as it has been to us. At MIT, Kerberos has been invaluable to our Information/Technology architecture.

Add a comment

Answer 2

please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...

Homework Answers

Add Answer to:
please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...

Post as a guest

Earn Coins

Can you please provide 5-10 scholarly or academic sources (No Wikipedia Or Unverified Source) on the...

Need 15 to 20 pages of the report on the topic below: Topic: Cryptography Standard: Message...

Kindly write a 2-pages critique paper on this topic below. organelles positioning in muscles requires cooperation...

write a short essay (1-2 pages) about malaria and take an overview of the topic in...

Write an essay of not less than 3 pages on any topic from any of the...

Write a paper consisting of two (2) to three (3) pages - summarizing the topic of...

For this assignment, you will write a 2 page minimum paper, the topic will be "Nursing...

can you write 3 to 5 pages on the following topic please: MSK (musculoskeletal) ultrasound. This...

I need 5 pages 1. Report Assessment Topic (20 marks) Write a report about the types...

Submit your Term Project Paper with required content below. Write on any topic related to technology...

please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...

Homework Answers

Add Answer to: please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...

Post as a guest

Earn Coins

Add Answer to:
please write 2 pages of the below topic (1) Apache Sentry ( authorization) Source: wikipedia and...