I'm in San Francisco attending QCon 2008 this week.
Dan Farino, Chief Systems Architect, MySpace.com
__The traditional distribution channels are being redefined...__
SMB Channels are out pacing large channels
Transacting within the community being redefined
Small producers can now compete with the large industries
- the boundaries fo the social sites aree not as clear sa the preceeding slides suggest
-- myspace and facebook transactd
-- ebay has a thriving community
-- digg and ing are about networking
__Social Architecture Challenges__
- worry not that no one knows of you - seek to be worth knowing -- confucious
Shared joy is a double joy - shared sorrow is half a sorrow - swedish proverb
connections between people are transitive and lack affinity
partitioning for scale is always problematic
Tools are sparse for large-scale windows server management
- Go Live
- Manage growth
MySpace plan as executed
- go live
- reboot servers - often
- "Shotgun debbugging"
-- a process of making relatively undirected changes to software in the hope that a bug will be perturbed out of existence.
-- need to resolve the problem now and collecting data for analysis would take too long
Windows 2000 Server
Operationally - where they were...
- batch files and robocopy for code deployment
- "psecec" for remote admin script execution
- Windows Performance Monitor for monitoring
No formal, automaoted QA process
- 4,500 web serves, windows 2003 IIS 6.0 ASP.NET
- 1,200 "cache" servers - 64-bit windows, 2003 (key value pair, distributed)
- 500+ database servers
- unit tests/automated testing
- don't fuzz the site nearly as thoroughly as the users do
- there are still problems that happen only in production
Ops Data Collection
Two types of systems:
- Static: collect, store, and alert based on pre-configured rules (e.g. Nagios)
- Dynamic: Write an ad-hoc script or application to collect data for an immediate or one-off need
(Very interesting: Windows Performance counter monitor display in their Ops Data Collection)
Cons of static system:
- relatively central configuration managed by a small number of administrators
- bad for one-off requests: change he config, apply, wait for data
- developer's questions usually go unanswered
Devlopers like to see their creations come to life
Cons of the dynamic system:
- it's not really a "system" at all - its an administrator running a script
- is a privileged operation: scripts are powerful and can be potentially make changes to the system
- even run as a limited userr, bad scripts can still DoS the system
- one-shot data collection is possible but learning about deltas takes a lot more code (and polling, yuck)
- different custom-data collection tools that request the same data point caused duplicated network traffic
*** They use Powershell a lot ***
Ideally, all operational data avaialble in the entire server farm should be queryable
New Operational Data-subscription platform for Ops Data Collection
- supports both "one-shot" and "persistent" modes
- can be used by non-privileged users
- a client makes __one__ TCP connection to a "Collector" server
-- can receive data related to thousands of servers via this one connection
-- like having all of the servers in a Jabber chat room and being able to talk to a selected subset of them at any time (over __one__ connection)
- Windows Performance Counters
- WMI Objects
-- event logs
-- hardware data
-- custom WMI objects published from out-of-process
- Log file contents
On Linux, plans are to hook into something like D-Bus...
All C#, asynch I/O - never blocks a thread
Uses MS "Concurrency and Coordination Runtime"
Agent runs on each host
Wire protocol is Google's Protocol Buffers
Clients and Agents can be easily writtten to the Agents wanted to see if C#+CCR could handle the load (yes it can)
Why develop something new?
- there doesn't seem to be anything out there right now that fits the need
- requirements include free and open source
To do it properly - you really need to be using 100% async I/O
Libraries that make this easy are relatively new
- GTask ==> Need to research this (for Linux asynch callback processing)
What does this enable?
- the individual interested in the data can gather it theirself.
- its almost like exploring a database with some ad-hoc SQL queries
- "I wonder"...questions are easily answered without a lot of work
- charting/alerting/data-archiving systems no longer concern themselves with the data-collection intricacies
- we can spend time writing the valuable code instead of rewriting the same plumbing every time
- abstract physical server-farm from teh user
- if you know machine names, great - but you can also say "all servers serving..."
- guaranteed to keep you up to date
- get your initial set of data and then just wait for the deltas
- pushes polling as close to the source as possible
- eliminates duplicate requests
- hundreds of clients can be monitoring the "% processor time"
- only collects data that someone is currently asking for
Is this a good way to do it?
- having too much data pushed at you is a bad thing
- being able to pull from a large selection of data points is a good thing
For developers, knowing they will have access to instrumentation data even in production encourages more detailed instrumentation.
Easy and fun API's
Using LINQ: Language Integrated Query
LINQ via C# and CLINQ ("Continuous LINQ") = instant monitoring app (in about 10 lines of code)
var counters = ...
MainWpdfWindows.MainGrid = counters;
// go grab a beer
Tail a file across thousands of servers
- with filtering expression being run on the remote machines
- at the same time as someone else is (with no duplicate lines being sent over the network)
- multicase only to the people that are subscribed (?)
Open Source it?
- may write a GTask / Erlang implementation
Digg: An Infrastructure in Transition
Joe Stump, Lead Architect, Digg
hundres of servers
"Web 2.0 sucks (for scaling)" - Joe Stump
- severe hair loss
- isn't something you do by yourself
- who cares?
Not necessarily concerned with how quick - but can they store everything they need - and return it in a reasonable amount of time.
Clusters of databases are designated as WRITE - others are READ
- Lucene (for search)
- Recommendation Engine (graph database)
- MogileFS - distributed web-dav store
- Database servers clusters - which serve different parts of the site (LULX, AFK, ZOMG, WTF, ROFL)
A normal relationship database doesn't work well for certains types of views into your data
MySQL has problems under high write-load
- XMPP? - stateless - can't go back in time
- Conveyor (allows rewind of data?)
- elastic horizontal partitions
- heterogenous partition types
- IDs live in multiple places
- partitioned result sets
- Memached + BDB
-- supports replication, keyscans
- 28,000+ writes per second
- Persistent key/value storage
- Works with Memcached clients
- used in areas where de-normalization would require more writes than MySQL can handle
- Digg Images
-- 15,000 - 17,000 submissions per day
-- crawl for images, video, embeds, source, and other meta data
-- ran in parallel via Gearman <== Need to research this...
- Green Badges
-- 230,000 diggs per day
-- most active Diggers are also most followed
-- 3,000 writes per second
-- ran in background via Gearman
-- Eventually consistent