PostgreSQL

patroni 설치

남용2 2019. 10. 14. 16:03

patroni 시나리오(로컬 환경에서 2개의 agensgraph instance로 진행)

  1. etcd 설치
  2. etcd 설치 확인
  3. etcd 시작
  4. patroni 설치
  5. patroni 설치 확인
  6. agensgraph 1 설정 파일 작성
  7. patroni 명령어로 agensgraph 1 instance 실행
  8. agensgraph 2 설정 파일 작성
  9. patroni 명령어로 agensgraph 2 instance 실행
  10. partronictl 명령어로 agensgraph instance 확인
  11. failback 테스트
  12. 자동 failover 테스트

다음은 patroni 시나리오를 실행한 명령어이다.

//etcd 설치
[bylee@localhost ha]$ wget https://github.com/etcd-io/etcd/releases/download/v3.3.9/etcd-v3.3.9-linux-amd64.tar.gz
[bylee@localhost ha]$ tar -vxf etcd-v3.3.9-linux-amd64.tar.gz
[bylee@localhost ha]$ mv etcd-v3.3.9-linux-amd64 etcd
 
//etcd 설치 확인
[bylee@localhost ha]$ cd etcd/
[bylee@localhost etcd]$ ./etcd --version
etcd Version: 3.3.9
Git SHA: fca8add78
Go Version: go1.10.3
Go OS/Arch: linux/amd64
 
//etcd 시작
[bylee@localhost etcd]$ ./etcd
2018-08-30 20:19:35.605206 I | etcdmain: etcd Version: 3.3.9
2018-08-30 20:19:35.613752 I | etcdmain: Git SHA: fca8add78
2018-08-30 20:19:35.613824 I | etcdmain: Go Version: go1.10.3
2018-08-30 20:19:35.613882 I | etcdmain: Go OS/Arch: linux/amd64
2018-08-30 20:19:35.614040 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2018-08-30 20:19:35.614115 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2018-08-30 20:19:35.614605 I | embed: listening for peers on http://localhost:2380
2018-08-30 20:19:35.614825 I | embed: listening for client requests on localhost:2379
2018-08-30 20:19:35.618661 I | etcdserver: name = default
2018-08-30 20:19:35.618681 I | etcdserver: data dir = default.etcd
2018-08-30 20:19:35.618687 I | etcdserver: member dir = default.etcd/member
2018-08-30 20:19:35.618691 I | etcdserver: heartbeat = 100ms
2018-08-30 20:19:35.618694 I | etcdserver: election = 1000ms
2018-08-30 20:19:35.618698 I | etcdserver: snapshot count = 100000
2018-08-30 20:19:35.618707 I | etcdserver: advertise client URLs = http://localhost:2379
2018-08-30 20:19:35.618712 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-08-30 20:19:35.618720 I | etcdserver: initial cluster = default=http://localhost:2380
2018-08-30 20:19:35.621179 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2018-08-30 20:19:35.621208 I | raft: 8e9e05c52164694d became follower at term 0
2018-08-30 20:19:35.621217 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-08-30 20:19:35.621222 I | raft: 8e9e05c52164694d became follower at term 1
2018-08-30 20:19:35.624582 W | auth: simple token is not cryptographically signed
2018-08-30 20:19:35.626257 I | etcdserver: starting server... [version: 3.3.9, cluster version: to_be_decided]
2018-08-30 20:19:35.633506 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
2018-08-30 20:19:35.634004 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-08-30 20:19:35.821606 I | raft: 8e9e05c52164694d is starting a new election at term 1
2018-08-30 20:19:35.821644 I | raft: 8e9e05c52164694d became candidate at term 2
2018-08-30 20:19:35.821666 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2018-08-30 20:19:35.821680 I | raft: 8e9e05c52164694d became leader at term 2
2018-08-30 20:19:35.821688 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2018-08-30 20:19:35.822210 I | etcdserver: setting up the initial cluster version to 3.3
2018-08-30 20:19:35.824298 N | etcdserver/membership: set the initial cluster version to 3.3
2018-08-30 20:19:35.824839 I | etcdserver/api: enabled capabilities for version 3.3
2018-08-30 20:19:35.824944 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
2018-08-30 20:19:35.825011 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-08-30 20:19:35.825070 I | embed: ready to serve client requests
2018-08-30 20:19:35.825664 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
 
 
 
//patroni 설치
//patroni dependency는 etcd이다. 이와 같이 안 하면 위에서 설치한 etcd를 사용할 수 없다.
[bylee@localhost patroni]$ sudo pip install patroni[etcd]
(...)
Requirement already satisfied: cdiff in /usr/lib/python3.5/site-packages (from patroni[etcd]) (1.0)
Requirement already satisfied: click>=4.1 in /usr/lib/python3.5/site-packages (from patroni[etcd]) (6.7)
Requirement already satisfied: urllib3!=1.21,>=1.19.1 in /usr/lib/python3.5/site-packages (from patroni[etcd]) (1.23)
Requirement already satisfied: six>=1.7 in /usr/lib/python3.5/site-packages (from patroni[etcd]) (1.11.0)
Collecting python-etcd<0.5,>=0.4.3; extra == "etcd" (from patroni[etcd])
Requirement already satisfied: idna<2.8,>=2.5 in /usr/lib/python3.5/site-packages (from requests->patroni[etcd]) (2.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3.5/site-packages (from requests->patroni[etcd]) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/lib/python3.5/site-packages (from requests->patroni[etcd]) (3.0.4)
Requirement already satisfied: pytz in /usr/lib/python3.5/site-packages (from tzlocal->patroni[etcd]) (2018.5)
Requirement already satisfied: dnspython>=1.13.0 in /usr/lib/python3.5/site-packages (from python-etcd<0.5,>=0.4.3; extra == "etcd"->patroni[etcd]) (1.15.0)
Installing collected packages: python-etcd, patroni
Successfully installed patroni-1.4.6 python-etcd-0.4.5
 
//patroni 설치 확인
[bylee@localhost patroni]$ patronictl version
patronictl version 1.4.6
 
 
//patroni 시작
//agensgraph 1 설정 파일 작성(=agens0)
[bylee@localhost patroni]$ cat agens0.yml
scope: default
#namespace: /service/
name: agens0
 
restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008
#  certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#  keyfile: /etc/ssl/private/ssl-cert-snakeoil.key
#  authentication:
#    username: username
#    password: password
 
# ctl:
#   insecure: false # Allow connections to SSL sites without certs
#   certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#   cacert: /etc/ssl/certs/ssl-cacert-snakeoil.pem
 
etcd:
  host: 127.0.0.1:2379
 
bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    master_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
#      use_slots: true
      parameters:
#        wal_level: hot_standby
#        hot_standby: "on"
#        wal_keep_segments: 8
#        max_wal_senders: 10
#        max_replication_slots: 10
#        wal_log_hints: "on"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p
 
  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums
 
  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 127.0.0.1/32 md5
  - host all all 0.0.0.0/0 md5
#  - hostssl all all 0.0.0.0/0 md5
 
  # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)
# post_init: /usr/local/bin/setup_cluster.sh
 
  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb
 
postgresql:
  listen: 127.0.0.1:5432
  connect_address: 127.0.0.1:5432
  data_dir: /home/bylee/ha/patroni/agens0
  bin_dir: /home/bylee/ha/agensgraph/bin
#  config_dir:
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: bylee
      password: bylee
  parameters:
    unix_socket_directories: '.'
 
#watchdog:
#  mode: automatic # Allowed values: off, automatic, required
#  device: /dev/watchdog
#  safety_margin: 5
 
tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
 
//patroni 명령어로 agensgraph 1 instance 실행
[bylee@localhost patroni]$ patroni agens0.yml
2018-08-30 20:19:43,555 INFO: Failed to import patroni.dcs.consul
2018-08-30 20:19:43,569 INFO: Selected new etcd server http://127.0.0.1:2379
2018-08-30 20:19:43,588 INFO: Lock owner: None; I am agens0
2018-08-30 20:19:43,604 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "bylee".
This user must also own the server process.
 
The database cluster will be initialized with locale "ko_KR.UTF-8".
initdb: could not find suitable text search configuration for locale "ko_KR.UTF-8"
The default text search configuration will be set to "simple".
 
Data page checksums are enabled.
 
fixing permissions on existing directory /home/bylee/ha/patroni/agens0 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
 
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
 
Success. You can now start the database server using:
 
    /home/bylee/ha/agensgraph/bin/ag_ctl -D /home/bylee/ha/patroni/agens0 -l logfile start
 
2018-08-30 20:19:44.653 KST [16879] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2018-08-30 20:19:44.657 KST [16879] LOG:  listening on Unix socket "./.s.PGSQL.5432"
2018-08-30 20:19:44,674 INFO: postmaster pid=16879
2018-08-30 20:19:44.684 KST [16881] LOG:  database system was shut down at 2018-08-30 20:19:44 KST
2018-08-30 20:19:44.686 KST [16879] LOG:  database system is ready to accept connections
2018-08-30 20:19:44.689 KST [16882] FATAL:  the database system is starting up
localhost:5432 - rejecting connections
localhost:5432 - accepting connections
2018-08-30 20:19:44,709 INFO: establishing a new patroni connection to the postgres cluster
2018-08-30 20:19:44,729 INFO: running post_bootstrap
2018-08-30 20:19:44,745 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2018-08-30 20:19:44,751 INFO: initialized a new cluster
2018-08-30 20:19:54,749 INFO: Lock owner: agens0; I am agens0
2018-08-30 20:19:54,774 INFO: Lock owner: agens0; I am agens0
2018-08-30 20:19:54,780 INFO: no action.  i am the leader with the lock
 
 
 
//agensgraph 2 설정 파일 작성(=agens1)
[bylee@localhost patroni]$ cat agens1.yml
scope: default
#namespace: /service/
name: agens1
 
restapi:
  listen: 127.0.0.1:8009
  connect_address: 127.0.0.1:8009
#  certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#  keyfile: /etc/ssl/private/ssl-cert-snakeoil.key
#  authentication:
#    username: username
#    password: password
 
# ctl:
#   insecure: false # Allow connections to SSL sites without certs
#   certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#   cacert: /etc/ssl/certs/ssl-cacert-snakeoil.pem
 
etcd:
  host: 127.0.0.1:2379
 
bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    master_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
#      use_slots: true
      parameters:
#        wal_level: hot_standby
#        hot_standby: "on"
#        wal_keep_segments: 8
#        max_wal_senders: 10
#        max_replication_slots: 10
#        wal_log_hints: "on"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p
 
  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums
 
  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 127.0.0.1/32 md5
  - host all all 0.0.0.0/0 md5
#  - hostssl all all 0.0.0.0/0 md5
 
  # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)
# post_init: /usr/local/bin/setup_cluster.sh
 
  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb
 
postgresql:
  listen: 127.0.0.1:5433
  connect_address: 127.0.0.1:5433
  data_dir: /home/bylee/ha/patroni/agens1
  bin_dir: /home/bylee/ha/agensgraph/bin
#  config_dir:
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: bylee
      password: bylee
  parameters:
    unix_socket_directories: '.'
  basebackup:
      - verbose
      - max-rate: 100M
 
tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
 
 
 
 
//patroni 명령어로 agensgraph 2 instance 실행
[bylee@localhost patroni]$ patroni agens1.yml
2018-08-30 20:29:43,684 INFO: Failed to import patroni.dcs.consul
2018-08-30 20:29:43,698 INFO: Selected new etcd server http://127.0.0.1:2379
2018-08-30 20:29:43,733 INFO: Lock owner: agens0; I am agens1
2018-08-30 20:29:43,736 INFO: trying to bootstrap from leader 'agens0'
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
WARNING:  skipping special file "./.s.PGSQL.5432"
pg_basebackup: write-ahead log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
2018-08-30 20:29:44,373 INFO: replica has been created using basebackup
2018-08-30 20:29:44,374 INFO: bootstrapped from leader 'agens0'
2018-08-30 20:29:44.471 KST [17108] LOG:  listening on IPv4 address "127.0.0.1", port 5433
2018-08-30 20:29:44.475 KST [17108] LOG:  listening on Unix socket "./.s.PGSQL.5433"
2018-08-30 20:29:44,476 INFO: postmaster pid=17108
2018-08-30 20:29:44.504 KST [17111] FATAL:  the database system is starting up
2018-08-30 20:29:44.504 KST [17110] LOG:  database system was interrupted; last known up at 2018-08-30 20:29:43 KST
localhost:5433 - rejecting connections
2018-08-30 20:29:44.511 KST [17113] FATAL:  the database system is starting up
localhost:5433 - rejecting connections
2018-08-30 20:29:44.674 KST [17110] LOG:  entering standby mode
2018-08-30 20:29:44.677 KST [17110] LOG:  redo starts at 0/2000028
2018-08-30 20:29:44.678 KST [17110] LOG:  consistent recovery state reached at 0/20000F8
2018-08-30 20:29:44.678 KST [17108] LOG:  database system is ready to accept read only connections
2018-08-30 20:29:44.683 KST [17117] FATAL:  could not start WAL streaming: ERROR:  replication slot "agens1" does not exist
2018-08-30 20:29:44.685 KST [17119] FATAL:  could not start WAL streaming: ERROR:  replication slot "agens1" does not exist
localhost:5433 - accepting connections
2018-08-30 20:29:45,530 INFO: Lock owner: agens0; I am agens1
2018-08-30 20:29:45,530 INFO: does not have lock
2018-08-30 20:29:45,530 INFO: establishing a new patroni connection to the postgres cluster
2018-08-30 20:29:45,547 INFO: no action.  i am a secondary and i am following a leader
2018-08-30 20:29:49.691 KST [17127] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
2018-08-30 20:29:54,751 INFO: Lock owner: agens0; I am agens1
2018-08-30 20:29:54,752 INFO: does not have lock
2018-08-30 20:29:54,757 INFO: no action.  i am a secondary and i am following a leader
2018-08-30 20:30:04,758 INFO: Lock owner: agens0; I am agens1
 
 
//partronictl 명령어로 agensgraph instance 확인
[bylee@localhost patroni]$ patronictl -d etcd://127.0.01:2379 list default
+---------+--------+-----------+--------+---------+-----------+
| Cluster | Member |    Host   |  Role  |  State  | Lag in MB |
+---------+--------+-----------+--------+---------+-----------+
| default | agens0 | 127.0.0.1 | Leader | running |         0 |
| default | agens1 | 127.0.0.1 |        | running |         0 |
+---------+--------+-----------+--------+---------+-----------+
 
 
//failback 테스트
[bylee@localhost patroni]$ patronictl -d etcd://127.0.01:2379 failover default
Candidate ['agens1'] []: agens1
Current cluster topology
+---------+--------+-----------+--------+---------+-----------+
| Cluster | Member |    Host   |  Role  |  State  | Lag in MB |
+---------+--------+-----------+--------+---------+-----------+
| default | agens0 | 127.0.0.1 | Leader | running |         0 |
| default | agens1 | 127.0.0.1 |        | running |         0 |
+---------+--------+-----------+--------+---------+-----------+
Are you sure you want to failover cluster default, demoting current master agens0? [y/N]: y
2018-08-30 20:34:40.84056 Successfully failed over to "agens1"
+---------+--------+-----------+--------+---------+-----------+
| Cluster | Member |    Host   |  Role  |  State  | Lag in MB |
+---------+--------+-----------+--------+---------+-----------+
| default | agens0 | 127.0.0.1 |        | stopped |   unknown |
| default | agens1 | 127.0.0.1 | Leader | running |           |
+---------+--------+-----------+--------+---------+-----------+
 
//접속 테스트
[bylee@localhost bin]$ ./agens -h 127.0.0.1 -d postgres
agens (AgensGraph 1.4devel, based on PostgreSQL 10.4)
Type "help" for help.
 
postgres=# create graph d;
ERROR:  cannot execute CREATE GRAPH in a read-only transaction
postgres=# \q
 
[bylee@localhost bin]$ ./agens -h 127.0.0.1 -p 5433-d postgres
agens (AgensGraph 1.4devel, based on PostgreSQL 10.4)
Type "help" for help.
 
postgres=# create graph d;
CREATE GRAPH
postgres=# create (:v{name:'patroni'});
GRAPH WRITE (INSERT VERTEX 1, INSERT EDGE 0)
postgres=# match (n) return n;
             n            
---------------------------
 v[3.1]{"name": "patroni"}
(1 row)
 
//자동 failover 테스트
//ctrl + c로 leader 종료
2018-08-30 20:41:21,181 INFO: Lock owner: agens1; I am agens1
2018-08-30 20:41:21,208 INFO: no action.  i am the leader with the lock
2018-08-30 20:41:31,177 INFO: Lock owner: agens1; I am agens1
2018-08-30 20:41:31,187 INFO: no action.  i am the leader with the lock
^C2018-08-30 20:41:39.204 KST [17108] LOG:  received fast shutdown request
2018-08-30 20:41:39.208 KST [17108] LOG:  aborting any active transactions
2018-08-30 20:41:39.209 KST [17108] LOG:  worker process: logical replication launcher (PID 17215) exited with exit code 1
2018-08-30 20:41:39.209 KST [17125] FATAL:  terminating connection due to administrator command
2018-08-30 20:41:39.211 KST [17114] LOG:  shutting down
2018-08-30 20:41:39.224 KST [17114] LOG:  checkpoint starting: shutdown immediate
2018-08-30 20:41:39.267 KST [17114] LOG:  checkpoint complete: wrote 96 buffers (0.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.032 s, total=0.045 s; sync files=61, longest=0.015 s, average=0.000 s; distance=299 kB, estimate=14775 kB
2018-08-30 20:41:39.271 KST [17108] LOG:  database system is shut down
2018-08-30 20:41:39,303 INFO: Lock owner: agens1; I am agens1
[bylee@localhost patroni]$
 
[bylee@localhost patroni]$ patronictl -d etcd://127.0.01:2379 list default
+---------+--------+-----------+--------+---------+-----------+
| Cluster | Member |    Host   |  Role  |  State  | Lag in MB |
+---------+--------+-----------+--------+---------+-----------+
| default | agens0 | 127.0.0.1 | Leader | running |         0 |
| default | agens1 | 127.0.0.1 |        | stopped |   unknown |
+---------+--------+-----------+--------+---------+-----------+