AppNeta | The Path

Archive for the ‘Fix it Now’ Category

As a pre-sales engineer, I see a lot of interesting network performance management scenarios while working with future customers on product trials.  I’ve seen everything from a managed switch that had a rogue 10-meg port to a problematic WiFi access point, located in the basement of a hospital!

On a recent trial , I was working with a network engineer who works for a video conferencing services provider.  Contrary to what I expected, they were not looking to solve a customer’s problem.  This particular customer was concerned with their own internal Unified Communications platform.  There were three core offices on the east coast, and a remote office in the UK.  Once I heard we were dealing with UC over the WAN between remote offices, I thought “Jackpot! This is PathView Cloud’s forte.”  This is going to be like a Shaquille O’Neal dunk at the Garden.  However, in the words of Lee Corso, “Not so fast, my friend.”

Once we had the PathView Cloud microAppliances deployed in the four various offices, we configured the network paths in a full mesh manner.  The spider web was starting to come together very nicely.  But as I looked the PathView dashboard, I started to see some violations represented by red bubbles on the interface.

Looking more closely at the results of the hop by hop path analysis, it was evident that duplex conflicts everywhere.  Every phone, video camera, even PC in this office was showing a duplex conflict.  “Great!” I thought.  We found the problem.  Change the duplexes. Case closed, let’s go home.

If it were only that easy…

Once we changed the settings, PathView Cloud continued to detect errors.  Some were cable errors, some were limiting errors.  I started to scratch my head – what could this possibly be?  I ran the results by a couple other engineers on my team.  Adam Edwards, Director of Systems Engineers, thought it could be the switch which was bad.

After I shared the findings with the customer, he was a bit hesitant because his SNMP polling device was saying the switch was running as it is supposed to.  To humour me, he swapped out the switch.  As soon as that happened, the performance was drastically better.  We still have not been able to definitively determine what caused the issue, though we are pretty sure it’s something tied to the settings that controlled the RTP/RTCP streams.

During the analysis phase of this, it felt like a twelve round bout with Mike Tyson.  We found duplex conflict, rate limiting, and eventually uncovered that the whole switch was bummed out.  All of this came right through the PathView Cloud interface within minutes of deploying the microAppliances.  PathView Cloud performance was like David Ortiz batting in the bottom 9th. Only he will win.

My field engineering team works with customers and partners in many regions and time zones, and we often host working sessions from remote locations.  This provides great opportunities to employ our own solutions and “eat our own dog food” to monitor and troubleshoot common services we use between sites.

I hosted a virtual technology workshop a few days ago  from our Portsmouth office.  We were working with a partner to propose a hosted VoIP PBX service management offer.  I started the meeting using Citrix Online’s GoToMeeting and dialed in using my Vonage business line. 

Early in the discussion I was interrupted by a brief click followed by…dead air.  Vonage has been very reliable from many locations, so I assumed this was a fluke and quickly rejoined the call.  A couple of minutes passed, and my call dropped again.  At this point I was having flashbacks to my days as a subscriper of  AT&T ‘s wireless network last year and experience with many call failures. 

I rejoined and offered my apologies, this time using my Sprint-powered Evo. 

Some of the team members were new to the project, so I took this opportunity to demonstrate PathView Cloud’s network performance monitoring and troubleshooting capability and used  our own experience as an example.

A PathView microAppliance in my office was monitoring my Internet connection all the way to a companion  device located in a San Francisco area hosted facility.  This shows the WAN performance from the remote office out over the Comcast cable network and out to ‘the cloud’ using a variety of protocols including ICMP, UDP, and TCP. 

As you can see from the performance charts, data and voice loss were substantial, peaking at 20% several times throughout the day.  Mean Opinion Score (MOS) suffered as well.  You can see a number of red diamond-shaped event markers indicating performance violations for the path.  

   

Looking at the same path in a dual-ended view, we see loss in both directions but peaking on the return leg – San Francisco to Portsmouth.  You can also see a lot of transience of the route taken by the UDP packets used by PathView.  Check out the yellow diamonds indicating each route change.

 

It’s no wonder I was having dropped VoIP calls!  Looking at the diagnostic showed a clear issue in the Comcast network, starting with the first hop near my office.  Packet loss in the range of 4-12 % was experienced when the diagnostic fired.  You can also see where the ISP is retagging the QoS values in the IP headers defined for the path.  This isn’t unheard of, since QoS is rarely supported over broadband networks and the best effort Public Internet. 

When troubleshooting an issue with VoIP over the WAN, you don’t necessarily have to own the hosted PBX to gain meaningful insight to the performance between your handset and the service.  Often times the problem is with the WAN connection, and you can easily use PathView Cloud to monitor the performance from your LAN out to a hosted microAppliance.  If you’re an existing PathView Cloud customer or partner, check out our support site for details on targeting one of our hosted microAppliances for this purpose. 

With a solid example of WAN performance affecting hosted VoIP quality as context we went on to complete a productive workshop that day. 

Our own dog food: delivered easily via the cloud, and it never tested so good!

-Adam

Part of the support role is to reach out to new customers and provide some general product training – ‘onboarding’ as we like to call it.  It’s not every day that we uncover network issues when providing this training with a client setup but it happens often enough not to phase me.

The other week I was working with a new client whose network was connected to the internet via a T1.  We were casually adding both dual-ended and single-ended WAN paths to demonstrate the differences when we saw data loss!  Yes, we stumbled across a rogue network issue the client had long suspected, but could never quite prove.  What a perfect demo!  We had a look at the dual-ended path and could clearly see loss on the upstream.  Of course, I was getting asked where this loss was being introduced.  Yep, you guessed it – D.I.A.G.N.O.S.T.I.C.! 
One look at the single-ended path diagnostic told me the gateway was introducing the loss.  We actually had to end the call at this point because the client had another meeting but I took a cheeky look later to see if the issue had been resolved.  Indeed it had; I could clearly see a blip where the microAppliance sequencer lost connectivity while network changes were being made, and zero data loss after connectivity was restored.  PathView – 1,  Data Loss – 0! Zing!

On September 29th, we experienced a large amount of poor call quality here at the Apparent Networks Boston headquarters.  Our sales team was unable to engage in communications and was quickly losing productivity.  With the VoIP server hosted at our remote office in Vancouver, there could have been any number of reasons why this was occurring.  However, PathView Cloud was able to provide our engineers with the full story before the sales team could even react.

PathView Cloud told us that our recent firmware upgrade had wiped the QoS settings we had deployed for our VoIP system allowing other network traffic to take precedence over our telephony packets in the VPN tunnel connecting our two offices, resulting in poor call quality and dropped calls. Click here to take a live look at our business production servers!

Name and Job title: Jack Turek,  IT Network manager

Company: Fletcher-Thompson, Inc.
Architecture and Engineering design firm
http://www.fletcherthompson.com
Network Problem: 
As the network manager for a fast paced architecture and engineering firm, Jack is managing a system with five locations stretching the east coast from Florida to Connecticut. Unfortunately Jack didn’t have the capability to continuously monitor or truly understand the performance for each location and traffic between them. Therefore, he couldn’t respond to complaints from his end users because he couldn’t see where the problems were occurring.

How to fix it?
The PathView systems engineer worked with Jack to configure PathView to monitor to 2 of the locations; and set Service Quality Definitions to do Deep Path Analysis whenever they saw any packet loss or dip in expected bandwidth.  PathView was able to identify some congestion problems at one of the sites in particular. PathView’s troubleshooting and monitoring solutions provided insight into the location of the problem and the exact cause: running a backup program was eating up the limited bandwidth and effecting users – leading to complaints and problems across the network.

——————————————————————————————————–
Are you having a network problem? Need some help fixing it quickly? Sign up for the PathView Fix it Now program to get a free trial download and some one-on-one assistance.


Follow us on Twitter!

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

%d bloggers like this: