如何在Ubuntu 18.04上安装Apache Kafka-白红宇

如何在Ubuntu 18.04上安装Apache Kafka

阅读量：2513 次

发布时间：2019-05-11

本文共 18104 字，大约阅读时间需要 60 分钟。

介绍 (Introduction)

is a popular distributed message broker designed to efficiently handle large volumes of real-time data. A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as and . Though it is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

是一种流行的分布式消息代理，旨在有效处理大量实时数据。与其他消息代理(例如和相比，Kafka集群不仅具有高度的可伸缩性和容错性，而且还具有更高的吞吐量。尽管通常将其用作发布/订阅消息传递系统，但许多组织也将其用于日志聚合，因为它为发布的消息提供了持久的存储。

A publish/subscribe messaging system allows one or more producers to publish messages without considering the number of consumers or how they will process the messages. Subscribed clients are notified automatically about updates and the creation of new messages. This system is more efficient and scalable than systems where clients poll periodically to determine if new messages are available.

发布/订阅消息传递系统允许一个或多个生产者发布消息，而无需考虑使用者的数量或他们将如何处理消息。订阅的客户端会自动收到有关更新和新消息创建的通知。该系统比客户端定期轮询以确定是否有新消息的系统更有效和可扩展。

In this tutorial, you will install and use Apache Kafka 2.1.1 on Ubuntu 18.04.

在本教程中，您将在Ubuntu 18.04上安装和使用Apache Kafka 2.1.1。

先决条件 (Prerequisites)

To follow along, you will need:

要继续进行，您将需要：

One Ubuntu 18.04 server and a non-root user with sudo privileges. Follow the steps specified in this if you do not have a non-root user set up.
一台Ubuntu 18.04服务器和一个具有sudo特权的非root用户。如果您没有设置非root用户，请遵循本指定的步骤。

At least 4GB of RAM on the server. Installations without this amount of RAM may cause the Kafka service to fail, with the throwing an “Out Of Memory” exception during startup.
服务器上至少有4GB的RAM。没有此数量的RAM的安装可能会导致Kafka服务失败，并且在启动过程中会引发“内存不足”异常。

8 installed on your server. To install this version, follow on installing specific versions of OpenJDK. Kafka is written in Java, so it requires a JVM; however, its startup shell script has a version detection bug that causes it to fail to start with JVM versions above 8.
服务器上安装了 8。要安装此版本，请按照安装特定版本的OpenJDK。 Kafka是用Java编写的，因此它需要一个JVM。但是，其启动Shell脚本具有版本检测错误，导致无法从JVM 8以上版本启动。

第1步-为Kafka创建用户 (Step 1 — Creating a User for Kafka)

Since Kafka can handle requests over a network, you should create a dedicated user for it. This minimizes damage to your Ubuntu machine should the Kafka server be compromised. We will create a dedicated kafka user in this step, but you should create a different non-root user to perform other tasks on this server once you have finished setting up Kafka.

由于Kafka可以通过网络处理请求，因此您应该为其创建一个专用用户。如果Kafka服务器受到威胁，这可以最大程度地减少对Ubuntu计算机的损坏。我们将在此步骤中创建一个专用的kafka用户，但是一旦完成设置Kafka，您应该创建一个不同的非root用户来在该服务器上执行其他任务。

Logged in as your non-root sudo user, create a user called kafka with the useradd command:

以非root用户sudo用户身份登录，使用useradd命令创建一个名为kafka的用户：

sudo useradd kafka -m
须藤useradd kafka -m

The -m flag ensures that a home directory will be created for the user. This home directory, /home/kafka, will act as our workspace directory for executing commands in the sections below.

-m标志可确保为用户创建一个主目录。此主目录/home/kafka将充当我们的工作区目录，用于执行以下部分中的命令。

Set the password using passwd:

使用passwd设置密码：

sudo passwd kafka
须藤密码卡夫卡

Add the kafka user to the sudo group with the adduser command, so that it has the privileges required to install Kafka’s dependencies:

使用adduser命令将kafka用户添加到sudo组中，以便它具有安装Kafka依赖项所需的特权：

sudo adduser kafka sudo
sudo adduser卡夫卡sudo

Your kafka user is now ready. Log into this account using su:

您的kafka用户现已准备就绪。使用su登录到该帐户：

su -l kafka
苏-卡夫卡

Now that we’ve created the Kafka-specific user, we can move on to downloading and extracting the Kafka binaries.

现在，我们已经创建了特定于Kafka的用户，我们可以继续下载和提取Kafka二进制文件。

第2步-下载和提取Kafka二进制文件 (Step 2 — Downloading and Extracting the Kafka Binaries)

Let’s download and extract the Kafka binaries into dedicated folders in our kafka user’s home directory.

让我们将kafka二进制文件下载并解压缩到我们kafka用户主目录中的专用文件夹中。

To start, create a directory in /home/kafka called Downloads to store your downloads:

首先，在/home/kafka创建一个名为Downloads的目录来存储您的下载内容：

mkdir ~/Downloads
mkdir〜/下载

Use curl to download the Kafka binaries:

使用curl下载Kafka二进制文件：

curl "https://www.apache.org/dist/kafka/2.1.1/kafka_2.11-2.1.1.tgz" -o ~/Downloads/kafka.tgz
卷曲“ https://www.apache.org/dist/kafka/ 2.1.1 / kafka_ 2.11-2.1.1 .tgz” -o〜/ Downloads / kafka.tgz

Create a directory called kafka and change to this directory. This will be the base directory of the Kafka installation:

创建一个名为kafka的目录，然后切换到该目录。这将是Kafka安装的基本目录：

mkdir ~/kafka && cd ~/kafka
mkdir〜/ kafka && cd〜/ kafka

Extract the archive you downloaded using the tar command:

使用tar命令解压缩下载的档案：

tar -xvzf ~/Downloads/kafka.tgz --strip 1
tar -xvzf〜/ Downloads / kafka.tgz --strip 1

We specify the --strip 1 flag to ensure that the archive’s contents are extracted in ~/kafka/ itself and not in another directory (such as ~/kafka/kafka_2.11-2.1.1/) inside of it.

我们指定--strip 1标志以确保将存档的内容提取到~/kafka/本身，而不是其中的另一个目录(例如~/kafka/kafka_ 2.11-2.1.1 / )中。

Now that we’ve downloaded and extracted the binaries successfully, we can move on configuring to Kafka to allow for topic deletion.

现在我们已经成功下载并提取了二进制文件，我们可以继续配置到Kafka以允许删除主题。

第3步-配置Kafka服务器 (Step 3 — Configuring the Kafka Server)

Kafka’s default behavior will not allow us to delete a topic, the category, group, or feed name to which messages can be published. To modify this, let’s edit the configuration file.

Kafka的默认行为不允许我们删除可以向其发布消息的主题，类别，组或订阅源名称。要修改它，让我们编辑配置文件。

Kafka’s configuration options are specified in server.properties. Open this file with nano or your favorite editor:

Kafka的配置选项在server.properties中指定。使用nano或您喜欢的编辑器打开此文件：

nano ~/kafka/config/server.properties
纳米〜/ kafka / config / server.properties

Let’s add a setting that will allow us to delete Kafka topics. Add the following to the bottom of the file:

让我们添加一个设置，使我们可以删除Kafka主题。将以下内容添加到文件的底部：

~/kafka/config/server.properties

〜/ kafka / config / server.properties

delete.topic.enable = true

Save the file, and exit nano. Now that we’ve configured Kafka, we can move on to creating systemd unit files for running and enabling it on startup.

保存文件，然后退出nano 。现在我们已经配置了Kafka，我们可以继续创建systemd单元文件以运行并在启动时启用它。

第4步-创建Systemd单位文件并启动Kafka服务器 (Step 4 — Creating Systemd Unit Files and Starting the Kafka Server)

In this section, we will create for the Kafka service. This will help us perform common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

在本节中，我们将为Kafka服务创建。这将帮助我们以与其他Linux服务一致的方式执行常见的服务操作，例如启动，停止和重新启动Kafka。

Zookeeper is a service that Kafka uses to manage its cluster state and configurations. It is commonly used in many distributed systems as an integral component. If you would like to know more about it, visit the official .

Zookeeper是Kafka用于管理其群集状态和配置的一项服务。它在许多分布式系统中通常用作不可或缺的组件。如果您想了解更多信息，请访问官方。

Create the unit file for zookeeper:

为zookeeper创建单位文件：

sudo nano /etc/systemd/system/zookeeper.service
须藤纳米/etc/systemd/system/zookeeper.service

Enter the following unit definition into the file:

在文件中输入以下单位定义：

/etc/systemd/system/zookeeper.service

[Unit]Requires=network.target remote-fs.targetAfter=network.target remote-fs.target[Service]Type=simpleUser=kafkaExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.propertiesExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.shRestart=on-abnormal[Install]WantedBy=multi-user.target

The [Unit] section specifies that Zookeeper requires networking and the filesystem to be ready before it can start.

[Unit]部分指定Zookeeper需要网络和文件系统才能启动。

The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for starting and stopping the service. It also specifies that Zookeeper should be restarted automatically if it exits abnormally.

[Service]部分指定systemd应该使用zookeeper-server-start.sh和zookeeper-server-stop.sh Shell文件来启动和停止服务。它还指定如果Zookeeper异常退出，则应自动重新启动它。

Next, create the systemd service file for kafka:

接下来，为kafka创建systemd服务文件：

sudo nano /etc/systemd/system/kafka.service
须藤纳米/etc/systemd/system/kafka.service

Enter the following unit definition into the file:

在文件中输入以下单位定义：

/etc/systemd/system/kafka.service

[Unit]Requires=zookeeper.serviceAfter=zookeeper.service[Service]Type=simpleUser=kafkaExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'ExecStop=/home/kafka/kafka/bin/kafka-server-stop.shRestart=on-abnormal[Install]WantedBy=multi-user.target

The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper gets started automatically when the kafka service starts.

[Unit]部分指定此单元文件取决于zookeeper.service 。这将确保zookeeper就会自动启动，当kafka服务启动。

The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. It also specifies that Kafka should be restarted automatically if it exits abnormally.

[Service]部分指定systemd应该使用kafka-server-start.sh和kafka-server-stop.sh Shell文件来启动和停止服务。它还指定如果Kafka异常退出，则应自动重新启动。

Now that the units have been defined, start Kafka with the following command:

现在已经定义了单位，使用以下命令启动Kafka：

sudo systemctl start kafka
sudo systemctl启动kafka

To ensure that the server has started successfully, check the journal logs for the kafka unit:

为确保服务器已成功启动，请检查kafka单元的日志日志：

sudo journalctl -u kafka
须藤journalctl -u kafka

You should see output similar to the following:

您应该看到类似于以下内容的输出：

Output   
   Jul 17 18:38:59 kafka-ubuntu systemd[1]: Started kafka.service.

You now have a Kafka server listening on port 9092.

现在，您有一个Kafka服务器在端口9092上侦听。

While we have started the kafka service, if we were to reboot our server, it would not be started automatically. To enable kafka on server boot, run:

启动kafka服务后，如果要重新启动服务器，它将不会自动启动。要在服务器启动时启用kafka ，请运行：

sudo systemctl enable kafka
sudo systemctl启用kafka

Now that we’ve started and enabled the services, let’s check the installation.

现在我们已经启动并启用了服务，让我们检查安装。

步骤5 —测试安装 (Step 5 — Testing the Installation)

Let’s publish and consume a “Hello World” message to make sure the Kafka server is behaving correctly. Publishing messages in Kafka requires:

让我们发布并使用“ Hello World”消息，以确保Kafka服务器的行为正确。在Kafka中发布消息需要：

A producer, which enables the publication of records and data to topics.
生产者 ，可以将记录和数据发布到主题。

A consumer, which reads messages and data from topics.
使用者 ，它从主题读取消息和数据。

First, create a topic named TutorialTopic by typing:

首先，通过键入以下内容创建一个名为TutorialTopic的主题：

~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic
〜/ kafka / bin / kafka-topics.sh --create --zookeeper localhost：2181-复制因子1-分区1 --topic TutorialTopic

You can create a producer from the command line using the kafka-console-producer.sh script. It expects the Kafka server’s hostname, port, and a topic name as arguments.

您可以使用kafka-console-producer.sh脚本从命令行创建生产者。它期望Kafka服务器的主机名，端口和主题名称作为参数。

Publish the string "Hello, World" to the TutorialTopic topic by typing:

通过键入以下内容将字符串"Hello, World"到TutorialTopic主题：

echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null
回声“你好，世界” | 〜/ kafka / bin / kafka-console-producer.sh --broker-list localhost：9092 --topic TutorialTopic> / dev / null

Next, you can create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server’s hostname and port, along with a topic name as arguments.

接下来，您可以使用kafka-console-consumer.sh脚本创建Kafka使用者。它期望ZooKeeper服务器的主机名和端口以及主题名称作为参数。

The following command consumes messages from TutorialTopic. Note the use of the --from-beginning flag, which allows the consumption of messages that were published before the consumer was started:

以下命令使用来自TutorialTopic消息。请注意--from-beginning标志的使用，它允许使用使用者启动之前发布的消息：

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning
〜/ kafka / bin / kafka-console-consumer.sh --bootstrap-server localhost：9092 --topic TutorialTopic --from-beginning

If there are no configuration issues, you should see Hello, World in your terminal:

如果没有配置问题，您应该在终端中看到Hello, World ：

Output   
   Hello, World

The script will continue to run, waiting for more messages to be published to the topic. Feel free to open a new terminal and start a producer to publish a few more messages. You should be able to see them all in the consumer’s output.

该脚本将继续运行，等待更多消息发布到该主题。随意打开一个新的终端，并开始一个生产者来发布更多消息。您应该能够在使用者的输出中看到它们。

When you are done testing, press CTRL+C to stop the consumer script. Now that we have tested the installation, let’s move on to installing KafkaT.

完成测试后，按CTRL+C停止使用者脚本。现在我们已经测试了安装，让我们继续安装KafkaT。

第6步-安装KafkaT(可选) (Step 6 — Install KafkaT (Optional))

is a tool from Airbnb that makes it easier for you to view details about your Kafka cluster and perform certain administrative tasks from the command line. Because it is a Ruby gem, you will need Ruby to use it. You will also need the build-essential package to be able to build the other gems it depends on. Install them using apt:

是Airbnb的工具，可让您更轻松地查看有关Kafka群集的详细信息并从命令行执行某些管理任务。由于它是Ruby宝石，因此需要Ruby才能使用。您还将需要build-essential包，以便能够构建它依赖的其他gem。使用apt安装它们：

sudo apt install ruby ruby-dev build-essential
须藤apt install ruby ruby-dev build-essential

You can now install KafkaT using the gem command:

您现在可以使用gem命令安装KafkaT：

sudo gem install kafkat
sudo gem install kafkat

KafkaT uses .kafkatcfg as the configuration file to determine the installation and log directories of your Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

KafkaT使用.kafkatcfg作为配置文件来确定Kafka服务器的安装目录和日志目录。它也应该有一个指向KafkaT指向ZooKeeper实例的条目。

Create a new file called .kafkatcfg:

创建一个名为.kafkatcfg的新文件：

nano ~/.kafkatcfg
纳米〜/ .kafkatcfg

Add the following lines to specify the required information about your Kafka server and Zookeeper instance:

添加以下行以指定有关您的Kafka服务器和Zookeeper实例的必需信息：

~/.kafkatcfg

〜/ .kafkatcfg

{  "kafka_path": "~/kafka",  "log_path": "/tmp/kafka-logs",  "zk_path": "localhost:2181"}

You are now ready to use KafkaT. For a start, here’s how you would use it to view details about all Kafka partitions:

您现在可以使用KafkaT。首先，这里是您使用它来查看有关所有Kafka分区的详细信息的方式：

kafkat partitions
卡夫卡特隔板

You will see the following output:

您将看到以下输出：

Output   
   Topic                 Partition   Leader      Replicas        ISRs    TutorialTopic         0             0         [0]             [0]__consumer_offsets    0               0           [0]                           [0]......

You will see TutorialTopic, as well as __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

您将看到TutorialTopic以及__consumer_offsets ，这是Kafka用于存储与客户端相关的信息的内部主题。您可以放心地忽略以__consumer_offsets开头的__consumer_offsets 。

To learn more about KafkaT, refer to its .

要了解有关KafkaT的更多信息，请参阅其。

步骤7 —设置多节点群集(可选) (Step 7 — Setting Up a Multi-Node Cluster (Optional))

If you want to create a multi-broker cluster using more Ubuntu 18.04 machines, you should repeat Step 1, Step 4, and Step 5 on each of the new machines. Additionally, you should make the following changes in the server.properties file for each:

如果要使用更多Ubuntu 18.04计算机创建多代理群集，则应在每台新计算机上重复步骤1，步骤4和步骤5。此外，您应该在每个服务器的server.properties文件中进行以下更改：

The value of the broker.id property should be changed such that it is unique throughout the cluster. This property uniquely identifies each server in the cluster and can have any string as its value. For example, "server1", "server2", etc.
应该更改broker.id属性的值，以使其在整个集群中都是唯一的。此属性唯一地标识集群中的每个服务器，并且可以将任何字符串作为其值。例如， "server1" ， "server2"等。

The value of the zookeeper.connect property should be changed such that all nodes point to the same ZooKeeper instance. This property specifies the Zookeeper instance’s address and follows the <HOSTNAME/IP_ADDRESS>:<PORT> format. For example, "203.0.113.0:2181", "203.0.113.1:2181" etc.
应该更改zookeeper.connect属性的值，以便所有节点都指向同一ZooKeeper实例。此属性指定Zookeeper实例的地址，并遵循<HOSTNAME/IP_ADDRESS>:<PORT>格式。例如" 203.0.113.0 :2181" ， " 203.0.113.1 :2181"等。

If you want to have multiple ZooKeeper instances for your cluster, the value of the zookeeper.connect property on each node should be an identical, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances.

如果要为群集提供多个ZooKeeper实例，则每个节点上的zookeeper.connect属性的值应该是一个相同的，用逗号分隔的字符串，其中列出了所有ZooKeeper实例的IP地址和端口号。

步骤8 —限制Kafka用户 (Step 8 — Restricting the Kafka User)

Now that all of the installations are done, you can remove the kafka user’s admin privileges. Before you do so, log out and log back in as any other non-root sudo user. If you are still running the same shell session you started this tutorial with, simply type exit.

现在，所有安装都已完成，您可以删除kafka用户的管理员特权。在执行此操作之前，请注销并以其他任何非root用户sudo用户身份登录。如果您仍在运行与开始本教程相同的Shell会话，只需键入exit 。

Remove the kafka user from the sudo group:

从sudo组中删除kafka用户：

sudo deluser kafka sudo
sudo deluser kafka须藤

To further improve your Kafka server’s security, lock the kafka user’s password using the passwd command. This makes sure that nobody can directly log into the server using this account:

为了进一步提高您的Kafka服务器的安全性，请使用passwd命令锁定kafka用户的密码。这样可以确保没有人可以使用此帐户直接登录到服务器：

sudo passwd kafka -l
须藤passwd kafka -l

At this point, only root or a sudo user can log in as kafka by typing in the following command:

此时，只有root或sudo用户可以通过键入以下命令以kafka身份登录：

sudo su - kafka
苏多苏-卡夫卡

In the future, if you want to unlock it, use passwd with the -u option:

将来，如果要解锁，请使用带有-u选项的passwd ：

sudo passwd kafka -u
须藤passwd kafka -u

You have now successfully restricted the kafka user’s admin privileges.

您现在已经成功限制了kafka用户的管理员权限。

结论 (Conclusion)

You now have Apache Kafka running securely on your Ubuntu server. You can make use of it in your projects by creating Kafka producers and consumers using , which are available for most programming languages. To learn more about Kafka, you can also consult its .

现在，您已使Apache Kafka在Ubuntu服务器上安全运行。通过使用创建Kafka生产者和消费者，可以在您的项目中使用它，这适用于大多数编程语言。要了解有关Kafka的更多信息，还可以查阅其。