Yes, Tencent has embraced Tencent cloud

Tencent questionnaire has been supporting Tencent's research questionnaire, but it is too heavy, not suitable for use in the micro-credit group real-time voting scene, combined with the beginning of WeChat small program release, for micro-credit scene, we (CDC) made a small program The

We are busy for a few weeks, the first version finally came out. In the operational phase to find a problem, and this problem in the Tencent questionnaire also exists:

Idle low load, peak high concurrent

Tencent has been supporting Tencent's almost all business research questionnaires, such as games, music, and drops and other external cooperation companies. Based on the size of the company, the questionnaire may be delivered to hundreds of thousands of users in an instant. Usually may not many users, but as long as there is a large-scale business to promote, the system load will become particularly high.

In the small voting process, this problem is more serious:

1. Can not control the time when individual users initiate voting

2. underestimate the spread of social scenes. Many votes may be a unit / school-initiated vote, requiring all employees / students to participate in the vote, in addition, the user will be forwarded to the external group, generate two-dimensional code forwarded to friends circle to vote

So we often see from the monitoring traffic soaring:

Traffic soared, the peak close to the usual 5 times

Faced with this situation, in addition to the application itself to do optimization, but also made a decision to migrate Tencent cloud. So, in this context, began to embrace Tencent cloud work.

Tencent cloud migration process, began to prepare

1. On the Tencent cloud before the preparation

Tencent vote in the initial project is based on Tencent questionnaire interface to develop. So the first step is to consider which components can be peeled off to reduce the workload of migration and improve follow-up maintainability.

  1. Database: Tencent questionnaire for business reasons selected MySQL, MongoDB and Redis, but not suitable for voting. Storage and statistics only use MySQL can, MongoDB can be removed.
  2. Code library: share a, strictly speaking, this is already two projects, so the need to split.
  3. Domain name: Tencent vote and Tencent questionnaire shared wj.qq.com, according to the path to forward the request to a different service. If Tencent voted to move to Tencent cloud, and this would have to go to the public network. To avoid this problem, we solve the problem by replacing the domain name. First in the IDC domain name switch, so Tencent vote on a separate domain name, after the migration directly modify the DNS resolution.
  4. Internal services: Goose factory's traditional technology architecture, many services can be called through the internal network interface to solve. For example, to detect whether the user content contains sensitive words of the keyword service, to Tencent cloud through the agent to access IDC services.

Leaving one of the most troublesome: database split. Tencent voting and Tencent questionnaire data are currently in the same database. How to split the data? We divided the process into four steps:

  1. Double Write: Write to the old database while writing to the new database. This will ensure that the new database only Tencent voting data. As is the asynchronous write, the basic user will not be affected by the request.
  2. Synchronization: Add the old data to the new database until the two databases are exactly the same.
  3. Verification: Verify that the data for both databases are exactly the same.
  4. Toggle: Make sure the data is consistent and stop after writing.

Now we have a complete and no redundant data of the independent database, and follow-up migration only need to back up and restore the database can be.

Migrate Tencent cloud process, migration

2. migration

The optimization is optimized, ready to prepare the migration program. What are the non-stop programs? The following is our analysis:

1. Tencent cloud through the green visit to IDC's MySQL

& Ndash; excellent: for business more secure

& Ndash; missing: unsafe for company

2. Tencent cloud through the public network to access IDC's MySQL

& Ndash; excellent: convenient

& Ndash; missing: unsafe

3. When IDC runs dual-write IDC and Tencent cloud database, data migration and switching when data is consistent

& Ndash; excellent: no extra strategy, security, no downtime

Lack of heavy workload

Downtime There are, that is, direct shutdown, backup and restore MySQL, DNS resolution, and then stupid can do this thing

Excellent: safe and reliable, low cost of time

& Ndash; missing: need to stop service

Taking into account security policy, time cost and other factors, we finally chose to stop the migration. In order to no problem, there are some work before and after cutting:


The program is selected, the environment is ready, followed by repeated drills, check the migration process may be problems, and even the implementation of each of the Shell orders are recorded. Until all the records of the Shell command does not need to be modified, the direct implementation can complete the migration work.


Pre-work is ready, the risk is also analyzed, rollback program also has, drilled so many times, but also on the battlefield, and then do not move, the boss came back to find me trouble. Looking for a weekend time in the morning, ready to stop the announcement. And then you can follow the steps of the exercise step by step operation. After the test can do DNS switch.


After the DNS switch means that the traffic will be directly to the environment on Tencent cloud. The next few hours is the most intense time, watching the traffic slowly on the monitor, the user to create more votes up, it was relieved.

With regard to cross-machine room migration, different projects have different solutions. Specific analysis of specific circumstances, how to properly configure the original IDC resources and cloud computing resources, may depend on the project SLA, the size of the development team, security policy restrictions and other factors.

Tencent cloud migration process, flexible telescopic tuning

3. elastic stretch

One of the pain points mentioned earlier is idle low load, peak high concurrent, there is no solution? There is, Tencent cloud flexible stretch. The use of flexible telescopic, can be done automatically during the day and automatically add the machine, destroyed at night. Any time if the amount of traffic suddenly increased, but also automatically add the machine and put into production environment.

After introducing the concept of flexible telescopic base, we also need to introduce two nouns: stateless and service discovery

No state

Stateless means that the service does not need to save the data (whether it is a short session or long-term user upload attachments). The advantage is that you can quickly copy and destroy instances without having to consider whether the data will be lost, which is the basic requirement for flexible scaling.

Because Tencent questionnaire has a series of files and other functions, no state will be very difficult, need a lot of time and effort. And vote just did not use these features. No file upload and other functions, session is not saved to the file system. Born is stateless, very suitable for horizontal expansion, the new instance only need to be added to the load balance can be put into use.

Service discovery

One of the benefits of statelessness is the quick copy or destruction of instances, so that it can be quickly scaled up. If this level of expansion also requires manual participation, then the efficiency will be inefficient. So you must run a cluster of services. So you will need a service discovery tool.

The service discovery can tell you which services are available in the instance, for example:

Monitoring script Q: Tencent voting back-end server which several?

Service Discovery Answer: 8080,,

After receiving the reply, the monitoring script can add these IP to Nginx or HAProxy load balancing.

With the help of flexible telescopic, Tencent voting back-end server frequently changed in the service discovery softwareConsulWith the help of the new machine can be put into use, destroyed automatically removed from the Nginx, to achieve the effect of not losing the user request.

Summary (the above two points need: policy configuration)

Application to do the above-mentioned stateless and service discovery mechanism, the next is Tencent cloud flexible telescopic configuration:

  1. Production mirror: This is Tencent cloud recommended flexible use of flexible, need to use the software pre-installed to the mirror, the machine is started with the mirror with the operating environment, no longer need to take the time to initialize the machine.
  2. Set the boot configuration: here refers to the elastic expansion of the machine automatically need to open what kind of configuration, how much memory, how many nuclear CPU, which mirror to start, in which the engine room to start (Tencent cloud has multiple rooms) and other options.
  3. Configure an alert policy: Create an instance when the existing instance is in a state After we observed for some time, found that more than 70% of the CPU and more than 60% of the memory, it should consider the new machine.

In conjunction with Tencent Cloud's flexibility, service discovery Consul and various monitoring systems, we did: When the system is under high load, the flexibility to open the new machine, monitor the script to synchronize the latest code and start the appropriate service. Finally, Consul put the new machine into use.

Elastic telescopic configuration

When the CPU utilization rate of more than 70% when the machine.

Monitor the alarm

Figure 12:00 can be seen to trigger the alarm, 12: 02 machine started to complete, 12:04 put into use. Elastic stretch or more to force.

Tencent cloud migration process, monitoring

4. monitoring

All the way busy, do a lot of changes, there is no impact on the user? Has the performance changed? Fast or slow? Can not let the user tell us now Fortunately, there are monitoring.

With the monitoring to know what impact each change, in the operation and maintenance changes, the release version, the heart is also more bottom. Developers should also develop good habits, every time to make changes, finished version will see the monitor.

For example, once, we found through the performance monitoring WeChat interface exception. At that time, the curve shows the user vote suddenly reduced by half, but the system components are normal, there is no error. After the investigation found that the WeChat development platform interface to do a change, a field to be removed. If not monitored, it is estimated to wait until the next day to find.

Said so much, what exactly? The following is the monitoring system we use:

  • ELK monitors, monitors front-end reporting, Nginx requests, PHP requests, user-created polls, user voting, and so on. Followed by a dynamic expansion, but also monitor the operation of each instance.
  • Tencent cloud monitoring, monitoring server CPU, memory, bandwidth, etc., do not have to maintain their own Nagios and other software.
  • Tencent cloud dial test, regularly check from multiple access points HTTP interface is normal, abnormal SMS alarm, very convenient. Which also avoid the monitoring and application hang together, even the notice can not issue the embarrassing situation.

Multi-directional monitoring

CPU will be soaring at high speed, but fortunately there is flexibility to stretch. In addition, we are optimizing the voting data structure, after optimization CPU fluctuations should be improved

The bandwidth following request has been added

Compare the effect before and after migration:

You can see the red slow request to reduce the bottom of the green also increased.

Server utilization There is room for optimization here, which we'll cover in the next article.

5. Summary and outlook

Tencent vote is a small and beautiful application, like Mr. Zhang Xiaolong said, run out and become a valuable tool for the user.

Follow-up we also plan to Tencent questionnaire https://wj.qq.com IDC resources and Tencent cloud resources, the use of flexible expansion to achieve dynamic expansion, in order to improve operational capacity in advance to reduce operating costs. As well as the use of Tencent cloud cloud services for the development of students to reduce the burden of operation and maintenance, so that students are more focused on the development of business development, to provide users with more valuable innovation.

