Disclaimer: the information in this post is the author’s personal opinion and is not the opinion or policy of his employer.
It was spring 2010 when we decided that even though Softlayer‘s server provisioning system is really great and it takes only a few hours to get a new server when we need it, it is still too long sometimes. We wanted to be able to scale up when needed and do it faster. It was especially critical because we were working hard on bringing up Facebook integration to our site and that project could have dramatically changed our application servers cloud capacity requirements.
What buzzword comes to your mind when we talk about scaling up really fast, sometimes within minutes, not hours or days? Exactly – cloud computing! So, after some initial testing and playing around with Softlayer’s (really young back then) cloud solution called CloudLayer and talking to our account manager we’ve decided to switch our application from a bunch of huge and at the time pretty expensive 24-core monster servers to a cluster of 8-core cloud instances. To give you some perspective: we had ~250 cores at the start of the project and at the end of 2010 we’d have more then 100 instances – we weren’t a small client with a few instances).
For those who are not familiar with Softlayer cloud: they sell you “dedicated” cores and memory, which is supposed to give you an awesome performance characteristics comparing to shared clouds like EC2.
Long story short, after a month of work on the project we had our application running on the cloud and were able to scale it up and down pretty fast if needed. And since the cloud was based on faster cpu and faster memory machines, we even saw improved performance of single-threaded requests processing (avg. response time dropped by ~30% as far as I remember). We were one happy operations team…
Read the rest of this entry »
Scribd is a top 100 site on the web and one of the largest sites built using Ruby on Rails. As one of the first rails sites to reach scale, we’ve built a lot of infrastructure and solved a lot of challenges to get Scribd to where it is today. We actively try to push the envelope and have contributed substantial work back to the open source community.
Scribd has an agile, startup culture and an unusually close working relationship between engineering and ops. You’ll regularly find cross-over work at Scribd, with ops people writing application-layer code and engineers figuring out operations-level problems. We think we’re able to make that work because of the uniquely talented people we have on the team.
To allow us to keep scaling, we’re now looking to add a strong, experienced operations guru to the team. As a member of Scribd operations, you’ll have tremendous ownership and responsibility for one of the web’s most popular applications. Because Scribd is a startup, you will wear many hats and have broader responsibility than you would at a larger company.
If you read this blog, you should already have a good sense of the kind of work you’ll be doing on this position.
The Ideal Profile
You are an experienced operations professional and have run ops at at least one large-scale website. You have comprehensive knowledge of a broad variety of system tools, from MySQL and Nginx to Squid and Memcached. You should also have strong software development skills and be well-versed in major programming languages. You should be strongly motivated, a creative solution finder, and ready to jump into the thorniest technical problems whenever necessary.
- Develop and maintain all aspects of Scribd’s operations infrastructure, including system monitoring, backups, server configuration, databases, and caching systems
- Collaborate with engineering to create next generation infrastructure to support changing requirements
- Predict scaling problems before they occur and work with engineering to prevent them
- Write and debug application level ruby code
- Participate in an on-call rotation
- Quickly diagnose server problems and employ preventive measures to maintain high availability servers
- Bachelors degree in CS or equivalent experience
- 3-5 years of professional experience in site operations
- Strong software engineering skills, including knowledge of major programming languages
- Strong database skills, preferably with MySQL, and overall linux knowledge
- Experience with most of the following technologies: MySQL, Nginx, Ruby, Memcached, Squid, git, Solr, HBase, Postfix
- Proven ability to quickly learn and implement unfamiliar technologies
- Strong desire to work hard at a rapidly growing company
Location: You are preferably located near San Francisco, CA. Relocation assistance is designed on a per-case basis. In short, we’ll be creative to get you here.
Contact: Please send your email cover letter and resume with the subject “Your name – Senior Site Operations Engineer – via Kovyrin.net” to firstname.lastname@example.org or contact me directly using any of my contacts. All communication and correspondence is held in the strictest confidence to ensure that you can connect and learn more without exposure.