Home > Blogs > BinaryWave
October 14
The One Thing Every SharePoint Administrator Needs

BinaryWave Eric Shupps SharePoint Cowboy Eric Alan Shupps eshupps spcowboyMaintaining an enterprise SharePoint environment can be a daunting task. Web servers, index servers, SQL databases, Office Web Applications servers, Workflow Manager servers, not to mention all the service applications and third-party components, spread across development, staging and production farms - it can quickly get out of control. Monitoring the physical infrastructure - CPU, memory, disk, bandwidth, and various performance counters - is a good first step in establishing some sort of order but it’s just not enough. You need to know how the application itself is behaving not just the underlying components.

You need more data.

But not just any kind of data. You need the right data, at the right time, to maintain peak performance. Fortunately, SharePoint includes a robust logging architecture that captures detailed information for a wide range of application events. This data can help administrators identify potential problems and trace the source of errors. But there’s a catch - the SharePoint logs contain thousands upon thousands of entries, most of which are low-value system process status messages, spread across multi-megabyte text files on each server in the farm. A tremendous amount of valuable data is buried in those files - if only there were an easy way to get to it and, more importantly, make sense of it all.

There are numerous tools available in the marketplace that expose SharePoint log data in an easy-to-consume format. Microsoft even provides a free utility for tracking, filtering and searching log events. If more data is better, and there are plenty of options for gathering such data, then why is it still so difficult to keep SharePoint running effectively?

Because you actually don’t need more data. You need better intelligence.

While log viewer tools may expose massive streams of application event data, they don’t help you analyze, categorize, or prioritize all that information. To really find out how SharePoint is functioning you need to know the volume of critical events being generated, what types of events occur most often, what your critical event count looks like hour-by-hour and which specific application components are responsible for each critical event. You need dashboards that provide health scoring based on critical event flow, charts and graphs to visualize aggregated data, drill-downs to analyze specific metrics, and a variety of reports to perform root-cause analysis.

In addition, you need access to this kind of intelligence at any given time - often weeks or months after a service interruption has occurred. Many organizations have policies in place that charge individual departments for the amount of disk space being consumed, making it very expensive to keep more than a few days or weeks worth of log files on the SharePoint servers. Even the out-of-the-box health analytics databases automatically purge log data every 30 days. This is an acceptable compromise for real-time troubleshooting but it makes root-cause analysis of critical service interruptions extremely difficult. Even worse, rolling log deletions destroy all historical data so there is no ability to establish a baseline or to determine the scope, frequency and impact of a specific event or set of events.

Offloading critical event data to an external repository doesn’t just save money - it also makes your job a whole lot easier. Data that has been aggregated, filtered, and indexed is much easier to search, analyze and report against. If you’ve ever tried finding a specific correlation ID within individual logs across a multi-server farm with only a vague idea of then it might have happened then you know how difficult and time consuming such a task can be. Having access to all of that data in one place, with a rich set of search and reporting capabilities in a modern, responsive user interface, can save countless hours of frustration and lost productivity.

If you are relying upon traditional log viewer tools to maintain and troubleshoot your SharePoint environment then it’s time to reevaluate your strategy. Operational intelligence tools like SmartTrack put critical information regarding the overall health and operational stability of any SharePoint farm right at your fingertips - at any time and on any device. Don’t waste any more time and money doing things manually or buying tools that give you even more data to sift through. Put a solution in place that makes sense of the data you already have and puts you back in control. 

 
Operational Analytics for SharePoint
  
January 03
The Critical Information Your System Monitoring Tools Are Missing

Effective IT operations management depends on a robust set of tools which provide insight into the status, behavior and health of critical systems. The information generated by such tools enables support personnel to make decisions and take action in order to ensure optimal service delivery. Meaningful, contextual, and actionable intelligence provided in real-time can help prevent costly outages, ensure SLA compliance and increase customer satisfaction. In essence, the effectiveness of any IT organization is only as good as the data it receives from the relevant tools.


Enterprise applications generate a large volume of raw operational data in the form of event logs. This information is typically stored in delimited text files or temporary databases and retrieved only in the event of a service interruption when additional information is needed to perform root cause analysis. Most systems provide various levels of event logging, from information messages to critical failure alerts, which taken together provide a holistic picture of what is happening within an application at any given point in time. Events are the "voice" of an application, the method by which it communicates how it is performing, what problems may be occurring and which functional areas may need attention.


As valuable as event information can be it is often difficult to derive actionable intelligence from it. Applications such as Microsoft SharePoint, which require multiple servers in a shared farm environment, can log thousands of informational, warning and critical events every minute, scattered across multiple servers in numerous geographic locations. These events provide a clear picture of what is actually happening within the application but they must first be gathered, correlated, filtered and analyzed before the information they contain can be used in any meaningful way. Unlike system counters, which provide simple numerical output that can easily be charted against established thresholds, event data is comprised of multiple data types, including verbose text, timestamps and categorical metadata. Without a tool designed specifically to extract, analyze and interpolate data from operational events all the intelligence that can be gained from them is unavailable to the service personnel and decision makers who need it the most.


Consider the following scenario: an organization running Microsoft SharePoint begins receiving reports from users that page response times have become abnormally long. The problem is intermittent at first but quickly escalates until the entire user base is impacted. Within a few hours, the system becomes unusable and critical business processes go offline. System performance counters are within normal operating parameters - CPU utilization, memory consumption, query execution time, cache hit ratio, web server throughput, and so on. No error messages are being exposed to the user or captured in the operating system event logs. After eliminating all external factors, such as network congestion, hardware failure, and the like, support escalates the issue to the application administrators who begin troubleshooting the issue. They gather the log files from each server and begin sifting through tens of thousands of event messages. With so much data to analyze and no specific error message or identifier to guide them the search takes hours - meanwhile work cannot be performed, money is lost, SLA parameters are exceeded and stress levels rapidly increase. After much investigation, administrators finally discover a pattern of warning messages in the application logs from a improperly configured page component. Removing the component from the master template resolves the issue and service is restored.


In this all-too-common situation, the information required to solve the problem was there all along in the application log files but it was never captured, escalated or acted upon until after an outage had occurred. By relying upon traditional server monitoring tools, operations personnel were missing the key data they required in order respond to a critical situation; in fact, had they been alerted when the event pattern initially became abnormal they could potentially have prevented the outage altogether, saving the company a great deal of money and themselves a significant amount of time. They simply didn't have the right tools to give them the right information at the right time.


This problem can be solved through the implementation of an Operational Analytics solution. OA tools are designed to gather, process, correlate, filter and evaluate application event data, isolating valuable chunks of information from streams of informational messages and assessing functional health in real time. By inspecting vast amounts of event data, OA software can identify behavioral patterns and determine when deviations occur from the baseline, alerting support personnel to potential issues, and enabling them to take proactive measures in order to ensure system continuity. These application profiles can be leveraged to establish health metrics, visualize trends, create actionable dashboards, simplify troubleshooting, generate actionable reports and perform various types of historical analysis.


An effective OA solution provides the following key benefits:

  • Efficiency - By aggregating event data from multiple servers into a single respository, filtering out low-value events and identifying patterns in event instances, the system can accelerate problem resolution by providing support personnel with information on current operating trends, instant access to specific event details and real-time alerts when high-value events occur.

  • Accuracy - Each server has an event profile unique to its role in the farm (web server, application server, database); likewise, each farm has a unique aggregate event profile that may differ from other farms in the environment (development, staging, production). These profiles are analyzed to determine a baseline of "normal" behavior and capture any functional deviation, resulting in a weighted score that is a more accurate reflection of application health than standardized thresholds.

  • Visibility - Event data is more comprehensive than SNMP-based alerts and performance counters. An easily accessible repository of core application event messages provides a wealth of information that can be used to perform root cause analysis, identify emerging trends, create historical comparisons, and support critical upgrade, enhancement or resource allocation decisions.

 


OA tools are a key element in a comprehensive event management strategy but they are only one part of the overall picture. System monitoring tools are also essential and when properly implemented the two work in concert - system monitoring focuses on "when" and "what", whereas operational analytics strives to answer "why", "where", and "how". Both provide necessary, but altogether different, information - implementation of one or the other in isolation will result in a less than complete event management solution and a limited view of overall system health. Organizations that learn how to leverage both toolsets will realize the benefits of increased system stability, streamlined service operations and greater operational efficiency.

Operational Analytics for SharePoint
  
  
Copyright © 2014. All Rights Reserved.

BinaryWave Inc. | 611 S. Main St. | Suite 400 | Grapevine, TX 76051 | (888) 387-1197


October 17
BinaryWave Sponsoring SharePoint Saturday UK 2011

​BinaryWave is sponsoring the 2011 SharePoint Saturday UK event to be held in Nottingham, England on November 12th.  For more information on the event, refer to the following announcement:

http://www.binarywave.com/events/Pages/SharePoint-Saturday-UK-2011.aspx

BinaryWave Founder and President Eric Shupps, recently awarded as a SharePoint Server MVP for the fifth year in a row, will also be presenting a developer-oriented session entitled "Customizing the SharePoint Packaging and Deployment Process in Visual Studio 2010".  If you plan on attending, be sure to register early as limited space is available.

 

 



BinaryWave Eric Shupps eshupps The SharePoint Cowboy SharePoint monitoring SharePoint monitoring tool SharePoint metrics SharePoint administratrion SharePoint monitoring best practices SharePoint management SharePoint management tool SharePoint operations SharePoint operationsmanagement SharePoint administration SharePoint administration tool SharePoint SLA SharePoint service level agreement SharePoint operational intelligence SharePoint performance SharePoint performance monitoring SharePoint analytics SharePoint real-time SharePoint intelligence SharePoint ITIL SharePoint service operations SharePoint uptime SharePoint alerts SharePoint health SharePoint tools SharePoint metrics SharePoint diagnostics SharePoint SmartTrack SmartTrack Operational Intelligence


Copyright © 2013 BinaryWave, Inc. All rights reserved.
This site is brought to you by BinaryWave in cooperation with Eric Shupps Eric Alan Shupps eshupps @eshupps The SharePoint Cowboy. We hope you enjoy the SharePoint-related content on topics such as performance, monitoring, administration, operations, support, business intelligence and more for SharePoint 2010, SharePoint 2013 and Office 365 created by Eric Shupps The SharePoint Cowboy. We also hope you will visit our product pages to learn more about SmartTrack, Operational Analytics for SharePoint, SharePoint monitoring, and SharePoint administration, while also discovering great offers from our partners. Please visit the blog of Eric Alan Shupps, Twitter handle @eshupps, for more information on application development, the SharePoint community, SharePoint performance, and general technology topics. Eric Shupps Eric Alan Shupps eshupps @eshupps The SharePoint Cowboy is the founder and President of BinaryWave, a leading provider of operational support solutions for SharePoint. Eric Shupps Eric Alan Shupps eshupps @eshupps The SharePoint Cowboy has worked with SharePoint Products and Technologies since 2001 as a consultant, administrator, architect, developer and trainer. He is an advisory committee member of the Dallas/Ft. Worth SharePoint Community group and participating member of user groups throughout the United Kingdom. Eric Shupps Eric Alan Shupps eshupps @eshupps The SharePoint Cowboy has authored numerous articles on SharePoint, speaks at user group meetings and conferences around the world, and publishes a popular SharePoint blog at http://www.binarywave.com/blogs/eshupps.