Showing: articles tagged with "metrics"

2012-02-21 21:52:32 Graph Annotations and Events

by Charlie Fiskeaux II

This feature has been a long time in coming: the ability to annotate your graphs! With the new annotations timeline sitting over the graph, not only can you create custom events to mark points in time, but you can also view alerts and see how they fit (or don't fit) your metric data.

Annotations Timeline

part of a screenshot of the new annotations interfaceFirst, let's go to a graph and take a look at the annotations timeline to see how it works. When you choose a graph and view it, you will immediately see the new Annotation controls to the left side of the date tools, and the timeline itself will render in between the date tools and the graph itself. The timeline defaults to collapsed mode and by default will only show alerts from metrics on the current graph, so you may have an empty timeline at first. If you take a look at the controls, however, you will see three items: the Annotation menu, the show/hide toggle button, and the expand/collapse toggle button. The show/hide button does just what it says: it shows or hides the timeline. The expand/collapse button toggles between the space-saving collapsed timeline view and the more informative expanded timeline view.

If you open the Annotation menu, you will see a list of all the items you can possibly show in your timeline (or hide from it). Any selections you make here (as well as your show/hide and expand/collapse state changes) will be saved as site-wide user preferences in your current browser. All the items are separated into three groups:

Event Categories
This is a list of all the Event categories under the current account (these are seen and managed in the Events section of the site…we'll get to that new section in a minute). If you have uncategorized events (due to deleting a category that was still in use), they will appear grouped under the "--" pseudo-category label.
Alerts
By default, the only alerts that will be shown will be alerts of all severity (sev) levels triggered by metrics on the current graph. If you wish, you may also show all alerts, and both categories of alerts may be filtered by sev levels. To do so, click one of the alert labels to expand a sev filter row with more checkboxes.
Text Metrics
This third group is not shown by default, but is represented by the checkbox at the bottom labeled "Include text metrics." If you check this box, the page will refresh, and any text metrics on the current graph will then be rendered as a part of the timeline (and will be excluded from the graph plot and legend).

Once you have some annotations rendering on the timeline, take a look at the timeline itself. Hovering over a point will show a detail tooltip with the annotation title, date, and description, and hovering over either a point or a line segment will highlight the corresponding date range on the graph itself.

Now for the question on everyone's minds: "Can I create events here, or do I have to go to the Events section to do that?" The answer is, yes, you can create events straight from the view graph page! To do so, simply use your right mouse button to drag-select a time range on the graph itself. A dialog will then popup for you to input your info and create the event.

Events Section

Now let's head over to the Events section where you can manage your events and event categories. Simply click on the new Events tab (below the Graphs tab) and you're there! To create an event, click the standard "+" tab at the upper left of the page. This will give you the New Event dialog. Most of the dialog inputs are pretty straightforward, with the exception of the category dropdown. This is a new hybrid "editable" dropdown input.the category select dropdown input in the new event dialog You may select any of its options if you'd like, or you can add new ones. To add a new option, simply select the last option (it's labeled "+ ADD Category"). Your cursor will immediately be placed in a standard text input where you can enter your new category. When you're finished, hit enter to create the new option and have it selected as your category of choice.

After you have created your event, you may need to edit it later. To edit any of its details, simply click on the pertinent detail of the event (when changing the event category, you will see it also has the new hybrid "editable" dropdown input which works exactly like the one in the New Event dialog).

In addition to start and end points (which may be the same date if you don't want more than a single point), you may also add midpoints to your event. Click the Show details button for an event (the arrow button at the right end of an event row), and you will see the Midpoints list taking up the right half of the event details panel. Simply click the Add Midpoint button to get the New Midpoint dialog where you enter a title, description and choose a date for your point.

The one last element of the Events section that's good to know about is the Categories menu at the upper right of the page. This allows you to delete categories as well as filter the Events list to only show a single category of events at a time. To do this, just click the name of a category in the Categories menu.

2011-03-22 19:00:36 Lost In Translation

by Theo Schlossnagle

For more than ten years, OmniTI has been making large-scale critical Internet infrastructure work. It is, obviously, not black magic or voodoo. Perhaps not so obviously, it is not technical competence that leads to success here. I like to think our team has technical competence in spades as we have an impeccable track record, authored books and a laundry list of speaking engagements to justify it. However, technical competence alone would fall short of the mark— far short.

Without exception, it is expected that proper monitoring and trending are as much a part of the process as setting up networking, backups, and more recently, change management. And yet, when you ask someone to explain why monitoring and trending were vital, you'd be lucky to get a response other than "to be sure things are working". Something here is lost in translation.

Disconnected Viewpoints

Every business owner knows that watching the books is part of the job. You need to know P&L, you need to understand the outputs and costs of your various business units and you track efficiencies everywhere. All of these metrics play a part in both strategic and tactical decisions made every day. Each business unit reports these things and while in good organizations each manager knows what is important to each other manager, something is still lost in translation. Far too often, managers don't understand that what they produce, what they consume and how they work changes the game for other business units. While the word is overused and abused, every business is an ecosystem. It is obvious that a new marketing campaign will increase resource utilization on the sales teams. It should be obvious that a new marketing campaign will increase resource utilization on IT infrastructure as well.

Every systems administrator knows (or should know) that monitoring your architecture is fundamental. On the other hand, very few can explain in any detail why this is so important. "Because you lose money when systems are offline", they'll quote disparagingly. Ask how much and you might catch them at a loss. From my own experience in operations, as well as countless conversations with customers and vendors, very few individuals recognize the relationship between IT and Business. Systems people know that they have to keep systems and services running to support their business, but rarely do they understand that relationship completely.

Owners that foster a transparent and cohesive organization around key performance indicators in every business unit (even those that are cost centers) will change their organizations in two critically useful ways:

  • Efficiencies between business units. With increased transparency, staff in all positions will see the effects of their actions across the business as a whole. This produces an atmosphere of self-reinforcing efficiency.
  • Accountability to the overall business. The hokey old question: "Is what you're doing good for the company?" changes form. With increased cohesiveness, the answer to that question is a more obvious outcome to every action and no one can call it hokey, because it is always answered without being asked.

A Call To Arms

Technology is no longer underneath the products you sell and the process in which you deliver them. It is, for at least the immediate future, intertwined. Creativity on the technology side doesn't only deliver cost savings, it creates new audiences and increases interaction with your customers. You have to do more than embrace technology, you need to leverage it and let new opportunities catapult your business forward.

As intertwined as technology is, we can no longer afford to have its operational details hidden away in the bowels of the "tech ops" or "web ops" group. We need visibility and we need cohesion. Infrastructure/application engineering and other business units are now, more than ever before, on the same team marching towards success. Communication and accountability are critical to success.

Here is where I leave you and hope that you will think about the metrics you monitor in a different light. They represent something more. They are there to make the business run, increase shareholder value, make your customers happier and more prosperous.

2010-10-25 22:50:48 Visualizing Regressions

by Jason Dixon

We've heard a lot of talk about Continuous Deployment strategies over the last 12-18 months. Timothy Fitz was one of the earliest proponents, publishing stories of their success over at IMVU last year. One of the greatest benefits to continually pushing your changes to production is that it takes less time and effort to find bugs when something goes wrong, since you have fewer commits in-between to navigate. But even with this style of release management, it helps to know which versions of code are running live on your components at any point. What happens when your newest code is enough to alter the normal behavior of the system, but not so drastic as to trigger an alert?

One of the nicer trending features in Circonus (or its open-source relative, Reconnoiter) is the ability to correlate unrelated datasets. I can take any collection of metrics on my account and group them together on a single graph. But what if you could view isolated events on the same graph, as an orthogonal data point? Check out these two graphs displaying some recent activity on one of our fault detection systems. The vertical lines represent the point at which a text metric's value changed. Circonus renders them this way so you can easily recognize that specific moment in time.

In the first graph I'm hovering over a dip in performance caused by the most recent release to that comment (svn r6230). In the second graph we're running a fix (svn r6232) for the regression introduced in the previous commit. Could I have done the same level of correlation manually? Of course, but it's nice to be able to zoom out and study the long-term affects of our release strategy on our overall stability. This is an enormously helpful tool for investigating Root Cause Analysis on our live systems, especially if you perform releases many times in a week (like we do). If you're one of many using automation and Configuration Management suites like Puppet, Chef and the Marionette Collective, no doubt you'll find it even more useful.

If you'd like to start trending your own text metrics, check out the Resmon DTD. Circonus can pull in your custom metrics in this format. Although the version numbers I mentioned earlier look like integers (well, they are integers), I can explicitly cast them as a string metric using the Resmon DTD. Here is what that might look like:

<ResmonResults> 
  <ResmonResult module="Site::CircProd" service="vers"> 
    <last_runtime_seconds>0.000274</last_runtime_seconds> 
    <last_update>1288044642</last_update> 
    <metric name="ernie" type="s">6297</metric> 
  </ResmonResult> 
</ResmonResults> 

As you might imagine, you can get pretty creative with the sort of data you can pull into Circonus. In our next post I plan to look at how you can combine WebHook Notifications (that Brian announced last week) with these text metrics to start trending your alert history. Stay tuned!

2010-03-06 22:48:37 Introducing Circonus

by Jason Dixon

Great ideas always begin with a catalyst. They can ignite in a flash of brilliance, or grow slowly like an ember hidden in the ashes of failure. Inspiration comes from different places, and is only ever cultivated into success with the right combination of talent, timing and fortitude.

And sometimes it just happens because you get fed up with inferior products.

The beginnings of Circonus land somewhere in-between. Created by the engineers at OmniTI, we've been dealing with the pains of performance monitoring and trending in highly scalable environments for years. We've tried various combinations of Open Source and COTS software packages, all of which left us with a sour taste and wanting for more.

Over the last couple of years, our team of highly skilled engineers, led by OmniTI's own Theo Schlossnagle, have been crafting and refining a truly convergent monitoring platform. Circonus started off as the Reconnoiter project, attempting to address the disconnect between existing monitoring and trending solutions.

Circonus is currently in a closed beta, receiving valuable feedback from customers and partners. We expect to launch publicly in April 2010. In the meantime, we'll use this blog as an outlet to discuss the upcoming release and divulge all the cool stuff in the pipeline. I hope you visit here often to find out what we're working on.

Jason Dixon
Product Manager
Circonus