You can see from the Twitter account of GitHub open source project open source Twitter has nearly 200 in the fields of distributed architecture, big data, asynchronous transfer network (client, server), Web, and other tools. Twitter can be called is built on open-source projects, the company responsible for Chris Aniszczyk source said that if there is no open-source software, Twitter would not exist, users send and receive in a mobile terminal and PC terminal each tweet will need open source software .
Typeahead.js & mdash; & mdash; automatic text completion jQuery plugin
This jQuery plugin comes from a new project Twitter to support remote and local data sets. More features is that you can use the data sets stored locally (local storage) to save locally, effectively improve the user experience. But also it has many treatment options for remote data collection, for example (request frequency, the maximum number of concurrent requests, etc.).
Twemoji & mdash; & mdash; Twitter's Emoji expression
TTwemoji is its complete open source Twitter Emoji emoticons. Developers can download the complete expression to GitHub repository, and these expressions into your own application or web page.
Hogan.js & mdash; & mdash; JS template engine
Hogan.js the Twitter team produced a parser for the mustache template. Hogan.js does not depend on any other libraries or frameworks, while ensuring the high efficiency of template parsing, and its volume is only 2.5K. Use it as part of your assets packaged templates compiled in advance or include it in your browser to handle dynamic templates.
Effective Scala & mdash; & mdash; Scala language
Scala is one of Twitter's main application programming language, most of the infrastructure is to use Scala to write, there are several large library package in support of the application, Scala is a large and efficient language should be used with caution in practice. It's a trap where the characteristics which we liked, in addition to what should pay attention to avoid? When the realization of & ldquo; purely functional style & rdquo; when, and pay attention to? Scala is mainly formed to create a large number of distributed systems and services.
Scala provides the tools needed to simplify the expression, reading less and less representative of typing, less reading representatives to read fast, simple and can increase the clarity (Road to SR). But simplicity is also a double-edged sword, which can cause the opposite effect, leading to a correct understanding of the reader is not enough.
& Mdash;; RPC framework Finagle & mdash
Finagle is a fault-tolerant protocol-independent RPC system for the JVM. Finagle use sbt build. Finagle from Twitter! It makes Java, Scala or any client and server heavy build robust JVM-based language is very easy. Finagle support broad-based request / reply protocol and the RPC protocol stream many types of
Use Finagle can quickly implement asynchronous remote method invocation RPC client and server side, the RPC itself is flexible enough to support a variety of variants, including a request response, streaming and pipeline mode, such as HTTP pipelines and pipeline Redis can also be easily stateful RPC run together, such as those that require authentication of RPC service.
FlockDB & mdash; & mdash; distributed graph database
FlockDB FIG stored as a set of edges, each edge is represented by two representatives of vertices 64-bit integer. For a social network diagram, these vertices ID is a user ID, but & ldquo; Collect & rdquo; Tweet this side of the goal vertex (destination id) is a tweets ID. Each edge are a 64-bit position identification information for sorting. (Twitter in & ldquo; concern & rdquo; class edge with a time stamp logo, so your followers list sorted by time, latest first.)
Snowflake & mdash; & mdash; distributed algorithm increment ID
Twitter in the storage system to migrate from MySQL to Cassandra in the process, because Cassandra is no order ID generation mechanism, then developed its own set of globally unique ID generated service: Snowflake. The advantages are: high performance, low latency; independent application; in chronological order. The disadvantage is: the need for separate development and deployment.
41 time series (accurate to the millisecond, you can use a length of 41 69 years);
10 machine identification (10 bits in length up to support the deployment of 1024 nodes);
12 counting sequence number (12-bit sequence number count support each node generates 4096 ID number per millisecond) the most significant bit is a sign bit is always 0.
Efficient and easily GUID generation algorithm, a int64_t field to be competent, unlike mainstream 128bit GUID of the algorithm, if not ensure strict ID sequential, but for specific services, such as using games server GUID generation will be very convenient . In addition, in a multi-threaded environment, the use of Atomic serial number can be effective in reducing the density of the lock code.
Diffy & mdash; & mdash; automated testing tools
Diffy is an open source automated testing tool that can automatically detect the Apache Thrift based or HTTP-based service. Use Diffy, need only a simple configuration, then you do not need to write test code.
Diffy mainly based on the stable version and its output copies of the release candidate of comparing output to check the candidate version is correct. Therefore, Diffy first release candidate should assume stable version & ldquo; similarity & rdquo; output. That is, whether stable release candidate version and system modules are the same, their final output should be & ldquo; similarity & rdquo; of. There has been use & ldquo; & rdquo ;, similar instead of using the same, because the same request may be some interference Diffy do not care about, such as:
Scalding & mdash; & mdash; Scala library
Scalding is a Scala library simplifies Hadoop MapReduce job development. Based on Cascading Construction. Scalding similar with Pig, but provides tighter integration of Scala.
Hadoop is a statistical term (counting words) distributed systems.
Gizzard & mdash; & mdash; generic data segmentation middleware
Gizzard is Twitter in April 2011 launched a new universal data segmentation middleware, occupy an important role in the Twitter architecture. Twitter also announced Gizzard complete code. With Gizzard, startups and small companies can better handle large amounts of data faster, and thus fewer resources to meet customer needs. Gizzard main functions are as follows:
& Mdash;; stream processing framework Summingbird & mdash
Summingbird streaming is MapReduce framework, a large-scale data processing system to support developers in batch mode (based on Hadoop / MapReduce) or streaming mode or mixed mode (that is, before a combination of both modes) (Storm-based) in a uniform manner code execution. It is based on Apache 2 license release for engineers to solve practical problems encountered in the use of existing methods:
Algebird & mdash; & mdash; Scala's abstract algebra tool
Algebird abstract algebra is used in Scala. These codes are mainly used to establish the polymerization system (via Scalding or Storm). Algebird associated with this component Summingbird: use some probabilistic algorithms HyperLogLog to increase computing speed.
Iago & mdash; & mdash; Web site load testing tool
Iago is a website load testing tool, Iago for a given site to access data recorded and synthesis flow. It differs from other load generation tool, it tries to maintain a constant request rate. For example, if you want to 100K per minute to request your services, Iago will try to maintain this speed test.
Heron & mdash; & mdash; real-time data analysis platform
May 25, 2016, Twitter announced Heron source. Heron's basic principles and methods: Real-time flow system is realized on the basis of a systematic analysis of large-scale data analysis. In addition, it needs: the ability to handle billions of events per minute, there is a delay in seconds, and predictable behavior; ensure the accuracy of data in case of failure, when it reaches peak traffic is resilient and easy to debug and shared simple deployment on the infrastructure.
To meet these needs, Twitter discussion of several options, including: expansion Storm, the use of other open-source systems, the development of a new platform. Because there is some demand for change Storm core architecture, so extend it requires a long development cycle. Other open origins of the process frame does not perfectly meet Twitter for size, throughput and latency requirements. Moreover, these systems are not compatible Storm API & mdash; & mdash; to adapt to a new API needs to be rewritten several topologies and modify advanced abstractions, which will lead to a long migration process. So, Twitter decided to establish a new system to meet the above mentioned requirements and compatible Storm API.
In Twitter, Heron as the main stream media system, running millions of development and production topologies. Since Heron efficiently use resources, after the migration Twitter all topologies, reducing overall hardware three times, resulting in Twitter foundation set efficiency has been significantly improved.
DistributedLog & mdash; & mdash; distributed logging Replication Service
DistributedLog (DL) is a high performance log replication service that provides persistent, strong consistency and replication features, which for building reliable distributed systems are essential, such as copying the state machine (replicated-state- machines), universal publish / subscribe systems, distributed database and distributed queue. DL will maintain records classification process sequences (sequences of records), and called Log (aka Log Stream), the record is written to the DL process called Log Writer, Log in and read from the records processed called Reader. DL advantages can be summarized as follows:
Ambrose & mdash; & mdash; visual monitoring system
Ambrose is an open source MapReduce visual surveillance systems Twitter released. It can monitor a Hadoop cluster (currently limited to Apache Pig) of MapReduce tasks. Ambrose plans to support:
SecureHeaders & mdash; & mdash; Web security tools
SecureHeaders is a gift to the Twitter Web developers, a Web sheep as fire safety tools, Secureheaders can automatically implement safety-related header rules, including the Content Security Policy (CSP), to prevent XSS, HSTS attacks against ( Firesheep) attacks and XFO click-jacking.
Activerecord-Reputation-System & mdash; & mdash; activity records reputation system
Activerecord-Reputation-System-based systems Rails developers, applications can automatically credit valuation based on the evaluation of the network, help developers find more information about the applications, the next step to guide decision-making. Twitter said, developers can be easily integrated in a Rails application of the system, or the system is isolated from the main application, in order to make better design.
The credit system is a network, a data network is updated according to the evaluation, and then calculate the value and reputation spread through the network. In this network, called direct value calculated according to the credit rating of the original credit (primary reputations), called indirect calculation of non-original credit (non-primary reputations).
& Mdash;; SPDY framework CocoaSPDY & mdash
CocoaSPDY is oriented OS X (Cocoa) and iOS (Cocoa Touch) of SPDY framework, based on their previous contribution to Netty, while they updated its iOS application that uses SPDY instead of plain HTTP. Twitter has been noted that the communication delay reduced by up to 30%, when & ldquo; user network conditions worse & rdquo;, the improvement is more effective.
SPDY has another advantage: & ldquo; multiplexing request & rdquo; & mdash; & mdash; a continuously send requests in a single TCP session and receive capability out of order response from the server push messages to the client, as well as compression request and response headers.
& Mdash;; UI framework TwUI & mdash
TwUI is a UI framework to support Mac hardware acceleration:
And UIKit different places:
Twemproxy & mdash; & mdash; proxy server
Twemproxy is a fast single-threaded agent support Memcached ASCII protocol and updated Redis protocol. It is written entirely in C, using the Apache 2.0 License authorization. The power of Twemproxy that it can be configured to disable the swap way node failure, while after a period of time to try again, or by using the specified key - & gt; server map. This means that when the Redis as a data store, which can Redis dataset fragment (disable the swap node expulsion); when the Redis as a cache, which enables nodes to achieve the expulsion of a simple high availability. Its characteristics are:
Fatcache & mdash; & mdash; Cache Service
Fatcache can let you in on the SSD running memcached, you can use it as a large data cache. Some of its performance data are as follows:
AnomalyDetection & mdash; & mdash; automatically detect the time series outlier R package
AnomalyDetection R is a language package, Twitter will usually during major news and sporting events with AnomalyDetection scanning inbound traffic, found that those who use the robot zombie account unsolicited mass (marketing) information.
AnomalyDetection abnormal scan
According to Twitter reports, complementary relationship AnomalyDetection and Twitter last October open source BreakoutDetection.
Traffic anomaly detection for known & ldquo; Earth Pulse & rdquo; when Twitter is very challenging because of the traffic long period span (eg one year) scan analysis, some unusual activity tends to hide out. Moreover, the reasons for the abnormal flow also varies, some are healthy, such as major news events caused by traffic anomaly, and some bad, such as QPS (queries per second) in real-time point-in- time decline may mean hardware or data collection problem.
Long cycle traffic anomaly detection