Category: Blog
Thinking of the person who pressed Go on today’s Crowdstrike release
20 Jul2024

Today’s tweet about the Crowdstrike incident, which seemingly brought the modern IT world to a standstill, reminded me of the darkest day of my professional life — when I accidentally knocked out internet access in a city of over 200,000 people.


It was my second year of university and I worked for a the largest local ISP in my home city as a junior system administrator. We had a large wireless network (~100km in diameter) covering our whole city and many surrounding rural areas. This network was used by all major commercial banks and many large enterprises in the area (bank branches, large factories, radio stations, etc).

To cover such a large area (in Ukraine in early 2000s), about 50% of which were rural villages and towns, we basically had to build a huge wifi network, that had a very powerful antenna in the center and many smaller regional points of presence would connect to it using directional wifi antennas and then distribute the traffic locally. The core router connected to the central antenna was located at the top floor of the highest building in the area about 20 min away from our office.

One day I was working on some monitoring scripts for the central router (which was basically a custom-built FreeBSD server). I’d run those scripts on a local stand I had on my table, make some changes, run it again, etc. We did not have VMs back then, so experimental work would happen on real hardware that was a clone of a production box. In the middle of my local debugging, I received a monitoring alert from our production saying that our core router had some (non-critical) issues. Since I was on-call that day, I decided take a look. Fixing the issue on the router, I went back to my debugging and successfully finished the job after about an hour.

And that’s where things went wrong… When I wanted to shut down my local machine, I switched to a terminal that was connected to the box, typed “poweroff”, pressed Enter… and only then realized that I did it on a wrong server! 🤦🏻‍♂️ I had that second terminal window opened ever since the monitoring alert an hour ago, and now I ended up shutting down the core router for our whole city-wide network!

What’s cool is that there was no blame in the aftermath of the incident. The team understood the mistake and focused on fixing the problem. We ended up having to drive to the central station and manually power the router back on. Back then we did not have any remote power management set up for that server and IPMI did not exist back then. Dark times indeed! 😉

As a result of that mistake, our whole city’s banking infrastructure and a bunch of other important services were down for ~30 minutes. Following the incident, we have made a number of improvements to our infrastructure and our processes (I don’t remember the details now) making the system a lot more resilient to similar errors.

Looking back now, huge kudos to my bosses for not firing me back then! This incident profoundly influenced my career in many ways:

First, the thrill of managing such vast infrastructures made me want to stay in technical operations rather than shifting to pure software development, a path many of my peers chose at the time. Then, having experienced such a massive error firsthand, I’ve always done my absolute best to safeguard my systems against failures, optimizing for quick recovery and being paranoid about backups and redundancy. Finally, it was a pivotal moment in my understanding of the value of blameless incident process long before the emergence of the modern blameless DevOps and SRE cultures — a management lesson that has deeply informed my approach to leadership and system design ever since.


Join Me at Swiftype!
18 Sep2013

As you may have heard, last January I have joined Swiftype – an early stage startup focused on changing local site search for the better. It has been a blast for the past 8 months, we have done a lot of interesting things to make our infrastructure more stable and performant, immensely increased visibility into our performance metrics, developed a strong foundation for the future growth of the company. Now we are looking to expand our team with great developers and technical operations people to push our infrastructure and the product even further.

Since I have joined Swiftype, I have been mainly focused on improving the infrastructure through better automation and monitoring, and worked on our backend code. Now I am looking for a few good operations engineers to join my team to work on a few key projects like building a new multi-datacenter infrastructure, creating a new data storage for our documents data, improving high-availability of our core services and much more.

To help us improve our infrastructure we are looking both for senior operations engineers and for more junior techops people that we could help grow and develop within the company. Both positions could be either remote or we could assist you with relocation to San Francisco if you want to work in our office.

If you are interested, you can take a look at an old, but still pretty relevant post I wrote many years ago on what I believe an ops candidate should know. And, of course, if you have any questions regarding these positions in Swiftype, please email me at [email protected] or use any other means for contacting me and I will try to get back to you as soon as possible. If you know someone who may be a great fit for these positions, please let them know!


New Chapter: Swiftype
31 Jan2013

So, after a few weeks of looking for a new job I’m really excited to start my journey in a young, but very ambitious startup called Swiftype which is focused on developing a technology for private site search, that could be used on everything from small blogs to large product sites. The company is growing really fast and I’m going to lead all the work on infrastructure, build the ops team and hope to get a chance to do some coding along the way.

Stay tuned – I really hope to finally get a chance to do more blogging this year. 🙂


Looking for a New Gig
14 Jan2013

As of today I’m no longer working for LivingSocial and I’m looking for the next thing to work on. Since my family is in Toronto and I have an apartment (mortgage) here, I’m not looking to relocate and currently looking for something remote (I have 7+ years of remote work experience) or something local in Toronto.

For more information on my background, please check my Github profile, my linkedin profile or the resume section on this blog. If you need to contact me, feel free to use any channels listed on the contacts page.

Update: After a few initial interviews I’d like to update this post with a bit more details on what I’m looking for in the new position.

First of all, I’m really not sure I want to be yet another ops engineer working on “everything ops” in my next company. If I’d be to join a company as a regular ops engineer, I’d prefer it to be a clearly defined role with a clear focus on some set of challenging problems. I’m honestly tired of setting up cacti/nagios/chef at this point and would like the job to be a little bit more challenging.

Though even more I’m interested in being able to make strategic technical decisions for an operations team and apply my experience and knowledge for solving challenging tasks with a dedicated team of ops engineers. This could be anything from a tech ops team lead role (in a medium/large companies) to a director of technical operations (in a small-to-medium sized startups).

Update: Ok, I’ve found a new job – I work for Swiftype now!


Cool Web Designer is Looking for Work
18 Jul2010

My wife – a good web designer with 6 years of experience with web design, HTML and CSS is looking for a job. Here is some information about her:

We’re physically located in Toronto, Canada, but she has a great experience of working remotely too. So, if you need a web designer or a junior web designer, feel free to contact Tanya.