Showing: articles tagged with "monitoring"

2011-12-13 19:21:15 Monitoring your Vitals During the Critical Holiday Retail Season

by Robert Treat

As with Brick & Mortar stores, the Holiday season is a critical time for many E-Commerce sites. Like their off-line brethren, these sites also see large increases in both traffic and revenue, sometimes substantially so. Of course these changes in user behavior don't just affect E-Commerce sites; consider a social-networking site like Foursquare, where a person might normally check into 3 or 4 places a week, during the Holiday season that might double as they visit more stores and end up eating out more often while rushing between those stores. On an individual basis it doesn't sound that significant, but if a large percentage of your user base doubles their traffic, you better hope you have planned accordingly.

On the technical side, many sites will actually change their regular development process in order to handle these changes in user behavior.Starting early in November, many sites will stop rolling out new features and halt large projects that might be disruptive to the site or the underlying infrastructure. As focus shifts away from features,most often it turns back towards infrastructure and optimization. Adding new monitoring, from improved logging to new metrics and graphs, becomes critical as you seek to have a comprehensive view of your sites operations so that you can better understand the changes in traffic that are happening, and hopefully be proactive about solving problems before they turn into outages.

Profiling and optimization work also receives more attention during this time; studies continue to show correlations between page load speeds and website responsiveness to increased revenue, and being able to improve these areas is something that can typically be done without having to change the behavior of how things work. Bugfixes are also a popular target during these times as those corner cases are more likely to show up as traffic increases, especially if you tend to see new users as well as an increase in use by existing users.

This brings us to a good question; just what are you monitoring? For most shops there tend to be standard graphs that get generated for this like disk space or memory usage. These things are good to have, but they only scratch the surface. Your operations staff probably knows all kind of metrics about the system the need to monitor, but how about your application developers? They should know the code that runs your site inside and out, so challenge them to find key metrics in your application stack that are important for their work. Maybe that's messages delivered to a queuing system, or the time it takes to process the shipping costs module, or measuring the responsiveness of a 3rd party API like Facebook or Twitter. But don't stop there;everyone in your company should be asking themselves "what analytics could I use to make better informed decisions"? For example, do you know if your increased traffic is due to new users or existing users? If you are monitoring new user sign ups, this will start to give you some insight. If you are doing E-Commerce, you should also be tracking revenue related numbers. Those types of monitors are more business focused but they are critical to everyone at your company. So much so that at Etsy, a top 100 website commonly known as "the worlds handmade marketplace", they project these types of metrics right out in public.

Ideally once you have this type of information being logged, you can collect the information for analytically reports and historical trending via graphs. You want to be able to take the data you are collecting and correlate between metrics. Given a 10% increase in new users in the past week, we've seen a 15% spike in web server traffic.If we project those numbers out, can we make it through Black Friday? Cyber Tuesday? Will we make all the way to New Years, or do we need to start provisioning new machines *NOW*? Or what happens if our business model changes, and we are required to live through a "black friday" event every day? That's the kind of challenges that social shopping site Gilt faces, with it's daily turnover of inventory. It's worth saying that you won't need all of this information real time, but ideally you'll be able to get a mix of real time, near-time (5 minutes aggregated data is common), as well as daily analytical reports. Additionally you should talk with your operations staff about which of these metrics are mission critical enough that we should be alerting on them, to make sure we have the operational and organizational focus that is appropriate.

While nothing beats preparation, even the best laid plans need good feedback loops to be successful. Measuring, collecting, analyzing, and acting upon data as it comes into your organization is critical in today's online environments. You may not be able to predict the future, but having solid monitoring systems in place will help you to recognize problems before they become critical, and help give you a "snowballs chance" during the holiday season.

2011-09-23 19:10:38 One Dashboard to Rule Them All

by Charlie Fiskeaux II

four icons representing a dashboardEver dream of having a systems monitoring dashboard that was actually useful? One where you could move things around, resize them, and even choose what information you wanted to display? Large enterprise software packages may have decent dashboards, but what if you’re not a large enterprise or you don’t want to pay an arm and a leg for bloatware? Perhaps you have a good dashboard that came with a specific server or piece of hardware, but it’s narrowly-focused and inflexible. You’ve probably thought about (or even tried) creating your own dashboard, but it’s a significant undertaking that’s not for the faint-of-heart. What’s the solution? Should we just learn to live with sub-optimal monitoring tools?

Here at Circonus, we decided that this was one problem we could eliminate. Since we’ve built a SaaS offering that’s flexible enough to handle multiple different data sources, why shouldn’t we build a dashboard that’s flexible enough to display them? So we created a configurable dashboard that lets you monitor your data however you want. Do you want to show graphs side-by-side but at different sizes? Done. Want an up-to-date list of alerts beside those graphs? Easy. How about some real-time metric charts that automatically refresh? No problem. Our new configurable dashboards allow you to add all these items and more. Let’s dig in and see how these new dashboards work.

Dashboard Basics

Start by going to the standard “Dashboard” and clicking the new “My Dashboards” tab. These dashboards are truly yours; any dashboards you create are only visible to you (by default) and are segregated by account. If you want to share a custom dashboard with everyone else on an account, check that dashboard’s “share” checkbox in your list of custom dashboards.

After you have created a custom dashboard, you may set it to be your default dashboard by using the radio buttons down the left side of your custom dashboards list. If you do this, you will be greeted with your selected dashboard when you login to Circonus. By selecting the “Standard Circonus Dashboard” as your default dashboard, you will revert to being greeted with the old dashboard you’re already used to seeing.

part of the interface for creating a new dashboard layout

To create a new custom dashboard, click the “+” tab and choose a layout. At first you will see only a couple predefined layouts available, but after you create a dashboard, its layout will then be available to choose when creating other new dashboards.

Now a note about working with these dashboards: every action auto-saves so you never have to worry about losing changes you’ve made. However, if you haven’t given your dashboard a title, the dashboard isn’t permanently saved yet. If you forget to title your dashboard and go off to do other things, don’t worry, the dashboard you created is saved in your browser’s memory. All you have to do is visit the “My Dashboards” page and your dashboard will be listed there. With two clicks you can give your dashboard a title and save it permanently. (Please note our minimum browser requirements—Firefox 4+ or Chrome—which are especially applicable for these new custom dashboards, since we’re using some features which are not available in older browsers.)

So let's create a dashboard. Choose a layout, click “Create Dashboard,” and you will be taken to the new dashboard with the “Add A Widget” panel extended. To begin, let’s check out the title area. Notice that when you hover over the title, a dropdown menu appears. This lists your other dashboards on the current account (as well as dashboards shared by other account members) and is useful for quickly switching between dashboards.

the dashboard interface showing the dashboard controls icons

To the right of the title are some icons. The first icon opens the grid options dialog, which lets you change the dimensions of the dashboard grid, hide the grid (it’s still active and usable, though), enable or disable text scaling, and choose whether or not to auto-hide the title bar in fullscreen mode. The second icon toggles fullscreen mode on and off. Once you enter fullscreen mode a third icon will appear, and this icon toggles the “Black Dash” theme (this theme is only available in fullscreen mode). The current states of both fullscreen mode and the “Black Dash” theme are saved with your dashboard.

One other note about the dashboard interface: if you leave a dashboard sitting for more than ten or fifteen seconds and notice that parts of the interface disappear (along with the mouse cursor), don’t worry…it’s just gone to sleep! A move of the mouse will make everything visible again. (If there are any widget settings panels open, though, the sleep timer will not activate.)

Widgets

Now for the meat of it all: widgets. We currently have ten widgets which can be added to the dashboard grid to show various types of data, and we’ll be adding more widget types and contents in the future. Following is a quick rundown of the currently available widgets:

Graph
Graph widgets let you add existing graphs to your dashboard. You may choose any graph from the “My Graphs” section under your current account. Graph widgets are refreshed every few minutes to ensure they’re always up-to-date.
Beacon Map
Map widgets let you add existing Beacon maps to your dashboard. You may choose any map query from the “Beacons” page (under the “Checks” section of your current account). Map widgets are updated in real-time.
Beacon Table
Table widgets let you add existing Beacon tables to your dashboard. You may choose any table query from the “Beacons” page (under the “Checks” section of your current account). Table widgets are updated in real-time.
Chart
Chart widgets let you select multiple metrics to monitor and compare in a bar or pie chart. Chart widgets are updated in real-time.
Gauge
Gauge widgets let you monitor the current state of a single numeric metric in a graphical manner, displaying the most recent value on a bar gauge (dial gauges are coming soon). Gauge widgets are updated in real-time.
Status
Status widgets let you monitor the current state of one or more metrics, displaying the most recent value with custom formatting. This is most useful for text metrics, but it may be used for numeric metrics as well. Status widgets are updated in real-time.
HTML
HTML widgets let you embed arbitrary HTML content on your dashboard. It can be used for just about anything, from displaying a logo or graphic to using an iframe to embed more in-depth content. Everything is permissible except Javascript. HTML widgets are refreshed every few minutes to ensure they’re always up-to-date.
List
List widgets let you add lists of graphs and worksheets to your dashboard, ordered by their last modified date. You may specify how many items to list and (optionally) a search string to limit the list. List widgets are refreshed every few minutes to ensure they’re always up-to-date.
Alerts
Alerts widgets let you monitor your checks by showing the most recent alerts on your current account. You may filter the alerts by their age (how long ago they occurred), by particular search terms, by severity levels, or other status criteria. Alerts widgets are refreshed every few minutes to ensure they’re always up-to-date.
Admin
Admin widgets let you monitor selected administrative information, including the status of all Circonus agents on your current account. Admin widgets are refreshed every few minutes to ensure they’re always up-to-date.

icons representing some of the current widget types

To add widgets to the dashboard grid, there are two methods: you may use the “drag-and-drop” method (dragging from the “Add a Widget” panel), or you may first click the target grid cell and then select the widget you want to place there. (Note: in fullscreen mode only the latter method is available.) After a widget has been added, some types of widgets will automatically activate with default settings, but most will be inactive. If the widget is inactive, click it to open the settings panel and get started. Once the widget is activated, the settings panel is available by clicking the settings icon in the upper right corner of the widget. In the lower right corner of the widget is the resize handle, so you can resize the widget as frequently as you want. And let’s not forget being able to rearrange the widgets—every widget has a transparent “title bar” at its top which you can use to drag it around. I won’t get into the details of settings for every type of widget, because they should be self-explanatory (and that would make this one super-long blog post). But suffice it to say, there are plenty of options for everyone.

We've been working hard to create a configurable dashboard that will be as flexible as Circonus itself is, and we believe we’ve hit pretty close to the mark. Here’s a sample dashboard showing the power of these new dashboards:

dashboard grid with several rectangular graph, chart, alerts and status widgets arranged in a grid

2011-03-22 19:00:36 Lost In Translation

by Theo Schlossnagle

For more than ten years, OmniTI has been making large-scale critical Internet infrastructure work. It is, obviously, not black magic or voodoo. Perhaps not so obviously, it is not technical competence that leads to success here. I like to think our team has technical competence in spades as we have an impeccable track record, authored books and a laundry list of speaking engagements to justify it. However, technical competence alone would fall short of the mark— far short.

Without exception, it is expected that proper monitoring and trending are as much a part of the process as setting up networking, backups, and more recently, change management. And yet, when you ask someone to explain why monitoring and trending were vital, you'd be lucky to get a response other than "to be sure things are working". Something here is lost in translation.

Disconnected Viewpoints

Every business owner knows that watching the books is part of the job. You need to know P&L, you need to understand the outputs and costs of your various business units and you track efficiencies everywhere. All of these metrics play a part in both strategic and tactical decisions made every day. Each business unit reports these things and while in good organizations each manager knows what is important to each other manager, something is still lost in translation. Far too often, managers don't understand that what they produce, what they consume and how they work changes the game for other business units. While the word is overused and abused, every business is an ecosystem. It is obvious that a new marketing campaign will increase resource utilization on the sales teams. It should be obvious that a new marketing campaign will increase resource utilization on IT infrastructure as well.

Every systems administrator knows (or should know) that monitoring your architecture is fundamental. On the other hand, very few can explain in any detail why this is so important. "Because you lose money when systems are offline", they'll quote disparagingly. Ask how much and you might catch them at a loss. From my own experience in operations, as well as countless conversations with customers and vendors, very few individuals recognize the relationship between IT and Business. Systems people know that they have to keep systems and services running to support their business, but rarely do they understand that relationship completely.

Owners that foster a transparent and cohesive organization around key performance indicators in every business unit (even those that are cost centers) will change their organizations in two critically useful ways:

  • Efficiencies between business units. With increased transparency, staff in all positions will see the effects of their actions across the business as a whole. This produces an atmosphere of self-reinforcing efficiency.
  • Accountability to the overall business. The hokey old question: "Is what you're doing good for the company?" changes form. With increased cohesiveness, the answer to that question is a more obvious outcome to every action and no one can call it hokey, because it is always answered without being asked.

A Call To Arms

Technology is no longer underneath the products you sell and the process in which you deliver them. It is, for at least the immediate future, intertwined. Creativity on the technology side doesn't only deliver cost savings, it creates new audiences and increases interaction with your customers. You have to do more than embrace technology, you need to leverage it and let new opportunities catapult your business forward.

As intertwined as technology is, we can no longer afford to have its operational details hidden away in the bowels of the "tech ops" or "web ops" group. We need visibility and we need cohesion. Infrastructure/application engineering and other business units are now, more than ever before, on the same team marching towards success. Communication and accountability are critical to success.

Here is where I leave you and hope that you will think about the metrics you monitor in a different light. They represent something more. They are there to make the business run, increase shareholder value, make your customers happier and more prosperous.

2010-06-29 03:41:34 Monitoring for Agile Operations

by Jason Dixon

One of the big announcements for us at Velocity 2010 last week was the formal release of our Developer site and Management API. Designed as a RESTful service, the Circonus API was designed to allow users to programmatically adjust monitors and alerts as their architecture evolves. Currently it supports all basic functionality for managing Checks, Metrics, Contacts and Contact Groups, Rules and Metric Dependencies. Support for managing Graphs and Worksheets will be released in a future version.

But publishing a Web Services API is only the first part of the puzzle. You really have to cultivate the community using it, by demonstrating just how easy and powerful it really is. We're planning to publish tons of useful examples here and over at the Developer site in the days and weeks to come. You might even see examples in the form of Chef recipes or Puppet modules.

Coincidentally, the guys over at Opscode have been doing their part to help out too. Adam Jacob, the CTO of Opscode and creator of Chef, took it upon himself to extend our API and make it even easier for Ruby and Rails users. Check out his ruby-circonus project over at GitHub.

Needless to say, the disciplines of Agile Operations and Infrastructure as Code rely on the sort of programmatic elasticity that our new API makes possible. Deploying systems and services is just one small part of the solution; it's vital to track the performance of your IT systems and be able to correlate their effects on your Business systems. Automating your monitoring system to evolve in step with your architecture is a great way to avoid the human factor which will inevitably result in missing monitors and alerts.

2010-05-10 14:37:26 Your Visitors Don't Matter

by Jason Dixon

Consider me old-fashioned, but I remember a time when an alert notification meant something. Drives failed, servers ran short on memory, or a cage monkey pulled the wrong cable at 3 A.M. Regardless of the circumstance, it demanded attention. Those were the days.

Today, operations is all about doing more with less. No more dedicated hardware or late-night maintenance windows. Everything is virtual, cloud-based, or filling up squares in the grid. Automation reigns supreme, limitless scalability at our disposal. Abstraction at its finest.

But woe unto you, the flapping anomaly.

That visitor who tried to load your website was turned away, timed out and left to wither. Poor Jane wanted to view your site. She needed to view your site. She'd already submitted her order, only to be ignored. Forgotten. Disconnected with nary a trace to route nor a cookie to favor.

Jane was a victim of a numbers game. Someone, somewhere, decided that some problems don't matter. Which ones? Who cares? They don't matter. And because she happened to visit when this problem reared its head, you ignored her request. Who would ever make such a silly presumption that one failure is less important than another? What criteria is used to determine the worthiness of this alert or that one? Pure random circumstance, it would appear.

Many "uptime" services and monitoring suites promote the concept of selective or flapping failures. Vendors sell these features as a convenience, ostensibly as a sleep aide. The administrator's snooze-bar. I can't think of any other reason that ignoring a faulty condition would be considered a good thing. Perhaps they reason that only the check is affected. If it responds after the third attempt, it was probably ok for visitors all along. Right?

It's disappointing how many vendors embrace this broken methodology. It probably seemed innocent at a glance. But the damage has been done; recklessness has taken root. We've been conditioned to accept these transient malfunctions as mere operational speed bumps. Rather than address the problem, we nudge the threshold a tad higher. Throw additional nodes into the cluster. Increase capacity, while decreasing exposure.

But there is a more responsible alternative. What ever happened to purposeful, iterative corrections and Root Cause Analysis? Notifications may be annoying at times, but they serve a crucial function in a healthy production architecture. Ignored alerts lead to stagnant bugs, lost traffic and missed opportunities. Stop treating your visitors like they don't matter. There's no such thing as a flapping customer.

2010-03-06 22:48:37 Introducing Circonus

by Jason Dixon

Great ideas always begin with a catalyst. They can ignite in a flash of brilliance, or grow slowly like an ember hidden in the ashes of failure. Inspiration comes from different places, and is only ever cultivated into success with the right combination of talent, timing and fortitude.

And sometimes it just happens because you get fed up with inferior products.

The beginnings of Circonus land somewhere in-between. Created by the engineers at OmniTI, we've been dealing with the pains of performance monitoring and trending in highly scalable environments for years. We've tried various combinations of Open Source and COTS software packages, all of which left us with a sour taste and wanting for more.

Over the last couple of years, our team of highly skilled engineers, led by OmniTI's own Theo Schlossnagle, have been crafting and refining a truly convergent monitoring platform. Circonus started off as the Reconnoiter project, attempting to address the disconnect between existing monitoring and trending solutions.

Circonus is currently in a closed beta, receiving valuable feedback from customers and partners. We expect to launch publicly in April 2010. In the meantime, we'll use this blog as an outlet to discuss the upcoming release and divulge all the cool stuff in the pipeline. I hope you visit here often to find out what we're working on.

Jason Dixon
Product Manager
Circonus