Home > News content

A word awake technician: you are not Google

via:博客园     time:2017/6/18 14:31:00     readed:1064

When looking for a solution to the problem, you must fully understand the problem itself, rather than blindly worship those giants. Ozan Onay takes Amazon, LinkedIn and Google as an example to ring the alarm for the energetic people. The following content has been translated by author, view the original:You Are Not GoogleThe

Software engineers are always fascinated by ridiculous things. We seem to be very rational, but in the face of technical selection, always caught in the crazy - from Hacker News to a variety of blog, like a moth, like tossing back and forth, and finally exhausted, no To fly to a group of light, kneel in front of it and mdash; that is what we have been looking for something.

Really rational people do not make decisions. But engineers have always been so, for example, whether to use MapReduce.

Joe Hellerstein said in his university database tutorial video:

There are only five companies in the world who need to run such a large-scale operation. As for the other companies, they use all the IO to achieve unnecessary fault tolerance. In the 2000s, people furiously followed Google: "We were going to do everything we did for Google because we were running the world's largest Internet data service." & Rdquo;

Beyond the actual needs of the fault tolerance is no problem, but we have to pay the heavy price: not only increased the IO, there may be the original mature system & mdash; & mdash; contains the transaction, index and query optimizer & mdash; & mdash It becomes broken. This is a very serious historical retrogression! How many Hadoop users are conscious of this decision? How many people know that their decision in the end is not a wise move?

MapReduce has become a public criticism, those blind worshipers are aware of something wrong. But this situation is widespread: although you use the big company's technology, but your situation is different from them, and your decision has not been carefully considered, you just used to think that imitation giants will be able to Bring you the same wealth.

Yes, this is another one to persuade everyone "do not blindly worship & rdquo; article. But this time I listed a long list of useful lists, may be able to help you make a better decision.

Cool technology? UNPHAT

If you are still using Google to search for new technologies to rebuild your software architecture, then I suggest you do not do that again. Instead, you can consider applying the UNPHAT principle.

  1. Do not rush to find a solution before you thoroughly understand your problem. Your goal should be to solve the problem in the problem area, rather than solve the problem in the program area.
  2. List (eNumerate) a variety of programs, do not just eye on your favorite program.
  3. Choose a candidate, and read the relevant paper (Paper).
  4. Understand the historical context of the candidate.
  5. Advantages (Advantages) and shortcomings, strengths and weaknesses.
  6. Think (Think)! Calmly think about whether the candidate is suitable for solving your problem. What kind of abnormal situation will let you change the attention? For example, what degree of data will be less to make you use the idea of ​​using Hadoop?

You are not Amazon

UNPHAT principle is very straightforward. I recently had a conversation with a company that intends to use Cassandra in a dense system, and their data is loaded into the system at night.

They read Dynamo's related essays and knew that Cassandra was the closest product to Dynamo. We know that these distributed databases give priority to write write availability (Amazon does not let "add to the cart" this operation fails). In order to achieve this goal, they made a compromise in consistency and almost all of the features that had appeared in traditional RDBMS. But the company does not need to give priority to write availability, because they only write once a day, but the amount of data is relatively large.

They consider using Cassandra because PostgreSQL queries take a few minutes. They think it is a hardware problem, after investigation, we found that the data table has 50 million data, each data up to 80 bytes. It would take up to five seconds to read all the data from the SSD, which is not fast, but it is two orders of magnitude faster than the actual query.

I really want to ask them a few questions (to understand the problem!), When the problem becomes more serious, I have prepared for them five programs (list of multiple candidates!), But it is clear that Cassandra for them It is a totally wrong solution. They only need to be patient to do some tuning, such as part of the data to re-modeling, may be considered to use (of course, may not have) other technology, but it is not such a high value of the key storage system, Create Cassandra is used to solve their shopping cart problem!

You are not LinkedIn

I was surprised to find that a small company founded by a student actually used Kafka in their system. Because as far as I know, they only have a small amount of business every day to deal with & mdash; the best case, up to a few hundred days a day. This throughput can be written directly in Notepad.

Kafka is designed to handle the internal throughput of LinkedIn, which is an astronomical figure. Even a few years ago, this figure has reached a daily trillions of dollars, in the peak hours need to deal with 10 million messages per second. But can Kafka be used to handle low-throughput loads, perhaps 10 times lower?

Perhaps the engineers are making their decisions based on their expectations and are well aware of Kafka's application. But I guess they can not resist the community's pursuit of Kafka, and did not really think about whether Kafka fit them. To know that it was 10 orders of magnitude difference!

Once again, you are not Amazon

More well than Amazon's distributed database is its scalable architecture model, which is service-oriented architecture. In an interview in 2006, Werner Vogels noted that Amazon realized in 2001 that their frontiers needed to scale horizontally, and that the service-oriented architecture would help them achieve scalability. Engineers looked at each other, and only a handful of engineers started doing this thing, and almost no one was willing to split their static pages into small services.

But Amazon decided to transition to SOA, they were 7800 employees and 3 billion US dollars in sales.

Of course, not that you have to wait until there are 7800 employees to turn to SOA and hellip; just you have to think about it, it really can solve your problem? What is the root of your problem? Can you solve them in other ways?

If you tell me that your 50-person company is going to turn to SOA, then I can not help wondering why many large companies are still using the modular single-body application?

Even Google is not Google

Large-scale data flow engines such as Hadoop and Spark can be very interesting, but in many cases traditional DBMSs are better suited for the current load, and sometimes the amount of data is small enough to be placed directly into memory. Are you willing to spend $ 10,000 to buy 1TB of memory? If you have billions of users, each user can only use 1KB of memory, so your investment is far from enough.

Perhaps your load is big enough to write the data back to disk. So how much disk do you need? How much data do you have in the end? Google to create GFS and MapReduce, is to solve the entire Web computing problems, such as rebuilding the entire Web search index.

Perhaps you have read the GFS and MapReduce papers, Google's part of the problem lies in the throughput, rather than the capacity, they need to be distributed storage, because from the disk to read the byte stream to spend too much time. So how much equipment do you need to use in 2017? You do not need much of Google as much throughput, so you might consider using a better device. If you are using SSD will give you how much cost?

Maybe you still want to be scalable. But you have a careful calculation of your data growth rate will be faster than the SSD price? How much will your business grow before your data is preloaded with all the machines? As of 2016, Stack Exchange to deal with 200 million requests a day, but they only used four SQL Server, one for Stack Overflow, one for other purposes, the other two as a backup copy.

Perhaps after you apply the UNPHAT principle, you still decide to use Hadoop or Spark. Perhaps your decision is right, but the key is that you want to use the tool. Google is very aware of this truth, when they realize that MapReduce is no longer suitable for building the index, they will no longer use it.

Know your question first

I am not talking about what new ideas, but perhaps UNPHAT for you is enough. If you do not think enough, you can listen to Rich Hickey's speech & ldquo;Hammock drive development& Rdquo ;, or look at Polya's book "How to Solve It"Or learn about Hamming's course & ldquo;The Art of Doing Science and Engineering& Rdquo ;. I urge you to think more! Try to solve the problem before they have a full understanding of them. Finally sent a Polya's famous quote:

To answer a question you do not understand is stupid, it is sad to arrive at an end you do not expect.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments