|
|
|
|
---
|
|
|
|
|
layout: post
|
|
|
|
|
title: 论备份的重要性
|
|
|
|
|
tags: [备份]
|
|
|
|
|
---
|
|
|
|
|
只有事情发生到自己头上才想到要解决<!--more-->
|
|
|
|
|
|
|
|
|
|
# 起因
|
|
|
|
|
今天早上发生了一件很糟糕的事情,一打开聊天软件就发现有人在说我维护的花火学园挂掉了,错误信息是无法访问源站。我觉得挺奇怪,服务器又没啥负载,我也有一段时间没登进去了,怎么服务器就挂掉了?
|
|
|
|
|
我试着用SSH连接,同样无法连接,我发现事情不太对劲,然后就登到了Vultr里看了看。结果发现我的服务器在00:00之后就像一个死人一样,CPU负载被拉成了一条直线,就那样保持0%的位置。我以为是因为莫名其妙的原因服务器关机了,然而我重启以后仍然没有解决问题。
|
|
|
|
|
登录到终端一看,`No bootable device`就这样显示在屏幕上,硬盘直接读不出来了,这下可不得了了,我赶紧去快照里看了一下,发现最后一次快照的时间在5月30日,也就是说如果没能恢复数据这十几天的所有信息都将消失!
|
|
|
|
|
|
|
|
|
|
# 难以想象的垃圾服务商:Vultr
|
|
|
|
|
首先我要做的事情当然是想办法先恢复服务,虽然那个快照有点早,但是先顶上再说吧……
|
|
|
|
|
想一想我的防护应该做的也没啥问题,而且一般成功入侵服务器的人也应该是删库然后留一条信息的那种,直接干死硬盘的我还真没见过。于是我开始发Ticket给Vultr,看看到底是怎么回事。
|
|
|
|
|
Vultr在我问完的4个小时后给出了最终的解决方案,把我的服务器直接重置,在上面安了新的操作系统然后给我赔了两个月的服务器费用……原文如下:
|
|
|
|
|
> Hello,
|
|
|
|
|
>
|
|
|
|
|
> In the past 24 hours, we sent notification of a node failure impacting your cloud server listed above.
|
|
|
|
|
>
|
|
|
|
|
> Despite extensive efforts, our attempts to manually recover your cloud server were unsuccessful.
|
|
|
|
|
>
|
|
|
|
|
> Our engineering team is currently deploying new instances with the same operating system and IP and you will receive login details in a separate message. You may also deploy a backup or snapshot on a new instance with a new IP if you prefer.
|
|
|
|
|
>
|
|
|
|
|
> Our staff will be applying a two month account credit for the affected services shortly.
|
|
|
|
|
>
|
|
|
|
|
> Regards,
|
|
|
|
|
> Bryan M.
|
|
|
|
|
> Systems Administrator
|
|
|
|
|
|
|
|
|
|
哇,这真是太糟糕了,作为一家云服务器商就直接把客户的数据搞没了,然后就赔2个月的费用?要知道数据无价啊,就这么不负责任的吗?简直是不可思议啊!
|
|
|
|
|
不过我也没什么好办法了,也许他们不重装我还想着试试SystemRescueCD试试看能不能把整个磁盘复制出来,但是他们既然已经直接重装那就彻底没救了……QAQ
|
|
|
|
|
|
|
|
|
|
# 亡羊补牢
|
|
|
|
|
既然数据已经救不回来了,那我们也只能向前看,得想办法避免以后再出现这样的问题。因为我最近在期末阶段,比较忙,所以也不经常去打快照。虽然以前也出现过服务出问题的情况,像[MySQL挂了](/2020/01/05/devops.html)、CDN挂了、还有一次好像是交换机出问题了,但是无论如何数据从来没有丢失过。这一次数据都能丢了也真的是太糟糕了,要不是有快照,那就真成删库跑路了……
|
|
|
|
|
既然没时间打快照,我得想个办法搞一个自动打快照的东西。在网上搜了搜,还真有这样的脚本,于是我拿来改了改就装上去用了。
|
|
|
|
|
## 自动快照的脚本
|
|
|
|
|
```python
|
|
|
|
|
import requests
|
|
|
|
|
from requests import get
|
|
|
|
|
import re
|
|
|
|
|
import json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class __RPC:
|
|
|
|
|
def __init__(self, api_key, name):
|
|
|
|
|
self.api_key = api_key
|
|
|
|
|
self.api_info = None
|
|
|
|
|
self.name = name
|
|
|
|
|
self.errors = {
|
|
|
|
|
200: "Function successfully executed.",
|
|
|
|
|
400: "Invalid API location. Check the URL that you are using.",
|
|
|
|
|
403: "Invalid or missing API key. Check that your API key is present and matches your assigned key.",
|
|
|
|
|
405: "Invalid HTTP method. Check that the method (POST|GET) matches what the documentation indicates.",
|
|
|
|
|
412: "Request failed. Check the response body for a more detailed description.",
|
|
|
|
|
500: "Internal server error. Try again at a later time.",
|
|
|
|
|
503: "Rate limit hit. API requests are limited to an average of 2/s. Try your request again later."
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
def api_info_initial(self):
|
|
|
|
|
self.api_info = {"snapshot/create":"POST","snapshot/destroy":"POST","snapshot/list":"GET","server/list":"GET"}
|
|
|
|
|
|
|
|
|
|
def __getattr__(self, name):
|
|
|
|
|
return eval("__RPC")(self.api_key, self.name + "/" + name)
|
|
|
|
|
|
|
|
|
|
def __call__(self, **kwargs):
|
|
|
|
|
if not self.api_info:
|
|
|
|
|
self.api_info_initial()
|
|
|
|
|
if self.name not in self.api_info:
|
|
|
|
|
raise ValueError("The API is not exists.")
|
|
|
|
|
|
|
|
|
|
if self.api_info[self.name] == "GET":
|
|
|
|
|
res = requests.get("https://api.vultr.com/v1/" + self.name, headers={"API-Key": self.api_key},
|
|
|
|
|
params=kwargs)
|
|
|
|
|
elif self.api_info[self.name] == "POST":
|
|
|
|
|
res = requests.post("https://api.vultr.com/v1/" + self.name, headers={"API-Key": self.api_key}, data=kwargs)
|
|
|
|
|
|
|
|
|
|
if res.status_code == 200:
|
|
|
|
|
return res.status_code, res.text.strip()
|
|
|
|
|
elif res.status_code in self.errors.keys():
|
|
|
|
|
return res.status_code, self.errors.get(res.status_code)
|
|
|
|
|
else:
|
|
|
|
|
res.raise_for_status()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class Vultr:
|
|
|
|
|
def __init__(self, api_key):
|
|
|
|
|
self.api_key = api_key
|
|
|
|
|
|
|
|
|
|
def __getattr__(self, name):
|
|
|
|
|
return eval("__RPC")(self.api_key, name)
|
|
|
|
|
|
|
|
|
|
vultr = Vultr("API Key")
|
|
|
|
|
data = {'SUBID': '实例ID'}
|
|
|
|
|
status_code, resp = vultr.snapshot.create(**data)
|
|
|
|
|
requests.post("https://sc.ftqq.com/SCKEY.send",data ={"text":"快照已创建","desp": str(status_code)+resp})
|
|
|
|
|
# 删除旧快照
|
|
|
|
|
status_code, resp = vultr.snapshot.list() # /v1/snapshot/list
|
|
|
|
|
if status_code != 200:
|
|
|
|
|
print('获取快照列表失败' + str(status_code) + resp)
|
|
|
|
|
else:
|
|
|
|
|
print('成功获取到快照列表')
|
|
|
|
|
data_list = list(json.loads(resp).values())
|
|
|
|
|
data_list.sort(key=lambda x: x['date_created']) # 默认时间排序,由近到远
|
|
|
|
|
data_list_del = data_list[::-1][9:] # 取超过9个之后的快照
|
|
|
|
|
for data_del in data_list_del:
|
|
|
|
|
data = {'SNAPSHOTID': data_del.get('SNAPSHOTID')}
|
|
|
|
|
status_code, resp = vultr.snapshot.destroy(**data) # /v1/snapshot/destroy
|
|
|
|
|
if status_code != 200:
|
|
|
|
|
print('删除旧快照失败')
|
|
|
|
|
else:
|
|
|
|
|
print('成功删除一个旧快照')
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
把这个脚本放到Crontab里,每天执行一次就行了。
|
|
|
|
|
|
|
|
|
|
# 后记
|
|
|
|
|
相信服务器厂商是完全靠不住的事情,自己还得想办法做好备份。我甚至在想,阿三把Intel和微软都占领了,会不会有一个阿三也跑到Vultr里,然后对着我的硬盘大喊“把你变成咖喱”之类的23333。
|
|
|
|
|
现在不过是权宜之计,以后还是得想办法把整个论坛下载到本地,至少能搞个数据库的差异备份啥的也行啊……
|
|
|
|
|
|