Dell M8024-k 10G Switch and QLogic QME8242-k 10GbE and vSphere5 and Network Performance Issue

Hi,

We have an issue about NPAR / VMWARE / QME8242-k. Issues and problems are below , we are waiting update from Dell , if we got it i will inform from this article.

  • We are using QME8242-k NICs connected to PCM8024-k switch.
  • When VMs are on the same host they can communicate.
  • When you migrate a VM it cannot communicate with the VMs on the other hosts.
  • When you make FTP or data transfer with E1000 card on VM after some minutes or seconds network card loose it self, same issue is not available on VXNET3
  • When you copy something between two nodes via FTP or CIFS you will see that transfer rate is so so small KB/s

VM

Posted on 05/04/2012, in Blade, Uncategorized, VMware, vSphere5 and tagged , , , , , , . Bookmark the permalink. 15 Comments.

  1. What firmware are you using?

    Also, what driver are you using?

    • it is 4.07.83

      Now, update for you issue is looks like no firmware problem, Dell back us and monday we will test and inform they if you have a such problem you can use this too !

      The solution is to enable the rx_mac_check parameter in VMware. This is done by performing the following instructions:

      1) At the ESX console enter the following command to make the option persistent:
      esxcfg-module -s “rx_mac_check=1” qlcnic
      2) A reboot of the ESX server is required for the setting to take effect and will be retained across subsequent reboots.

      The rx_mac_check module parameter enables the qlcnic driver to check the source MAC address of received packets and determine if a MAC address that was previously assigned to a local vNIC has now changed to become an external address. In such cases, that MAC address is deleted. This ensures correct operation. Checking the source MAC address of received packets involves an additional processing and performance penalty.
      Performance testing has revealed the following information:
      For bulk data receive tests (netperf receive test), there is no difference in throughput with and without rx_mac_check. However, the CPU utilization is about 8-10% higher when rx_mac_check is enabled.
      – For multiple session latency tests (16-sessions of netperf TCP_RR test), the number of transactions/sec with rx_mac_check enabled is about 8-10% lower than the number without rx_mac_check.
      Overall, it seems the rx_mac_check adds about 8-10% path length to the receive code path. In ESX5.0, hypervisor will enable hardware LRO for certain netqueues (for instance when the guest VM is linux). In those cases, the impact would be lower.

  2. This is not working, i send a feedback to Dell again , waiting an answer …

  3. We experienced the same two issues. The rx_mac_check fixes the vMotion loss of communication issue. The performance issue was resolved when we updated the qlnic driver from 5.0.736 to 5.0.741.

  4. hi!
    I can’t speak English at all.
    Use translate google

    me too same two issues.
    The rx_mac_check fixes the vMotion loss of communication issue.
    But performance issue was
    1. firmware 5.0.736+
    2. esxcfg-module -s “rx_mac_check=1 lro_enable=0″ qlcnic
    the performance issue was resolved when we updated the lro_enable=0

    QnA)
    In addition, citrix xenserver Migration(vmotion) loss same
    Was not resolved

    —–or—–
    vmware 5.0.736+ Driver Parameters

    This section describes the driver module parameters:

    * lro_enable : Enable/disable HW LRO. Default:1, Enable=1 ,Disable=0
    * multi_ctx : Enable/disable Rx NetQueues support Default:1, Enable=1
    ,Disable=0
    * user_rx_queues : Number of RX queues excluding default Rx Queue,
    per PCI Function. Default:1 to 7, Min:0, Max:7
    * enable_tso : Enable/disable TSO support. Default:1, Enable=1 ,Disable=0
    * hw_vlan : Enable/disable HW VLAN support. Default:1, Enable=1 ,Disable=0
    * auto_fw_reset : Enable/disable auto firmware recovery support. Default:1,
    Enable=1 ,Disable=0
    * tx_desc : Transmit Descriptors in Host(Should be power of 2). Default:1024,
    Min:64, Max:2048”
    * rdesc_1g : Receive Descriptors for 1G(Should be power of 2). Default:512,
    Min:64, Max:4096
    * rdesc_10g : Receive Descriptors for 10G(Should be power of 2). Default:1024,
    Min:64, Max:8192
    * jumbo_desc_1g : Jumbo Receive Descriptors for 1G (Should be power of 2).
    Default:64, Min:32, Max:512
    * jumbo_desc_10g : Jumbo Receive Descriptors for 10G (Should be power of 2).
    Default:128, Min:32, Max:1024
    * md_capture_mask : Capture mask for collection of firmware dump.
    Default:0x1F, Min:0x3, Max:0xFF(Valid values : 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F, 0xFF)
    * md_enable : Enable/disable firmware minidump support.
    Default:1, Enable=1 ,Disable=0
    * rx_mac_check : Enable/disable check of mac address/mac learning in the Rx path.
    Default:0, Enable=1 ,Disable=0

    Ex)
    esxcfg-module -s “rx_mac_check=1 lro_enable=0 multi_xtx=0 hw_vlan=0” qlcnic

    duplication
    —–or—–

  5. hi!
    I can’t speak English at all.
    Use translate google

    me too same two issues.
    The rx_mac_check fixes the vMotion loss of communication issue.
    But performance issue was
    1. driver 5.0.736+
    2. esxcfg-module -s “rx_mac_check=1 lro_enable=0″ qlcnic
    the performance issue was resolved when we updated the lro_enable=0

    QnA)
    In addition, citrix xenserver Migration(vmotion) loss same
    Was not resolved

    —–or—–
    vmware 5.0.736+ Driver Parameters

    This section describes the driver module parameters:

    * lro_enable : Enable/disable HW LRO. Default:1, Enable=1 ,Disable=0
    * multi_ctx : Enable/disable Rx NetQueues support Default:1, Enable=1
    ,Disable=0
    * user_rx_queues : Number of RX queues excluding default Rx Queue,
    per PCI Function. Default:1 to 7, Min:0, Max:7
    * enable_tso : Enable/disable TSO support. Default:1, Enable=1 ,Disable=0
    * hw_vlan : Enable/disable HW VLAN support. Default:1, Enable=1 ,Disable=0
    * auto_fw_reset : Enable/disable auto firmware recovery support. Default:1,
    Enable=1 ,Disable=0
    * tx_desc : Transmit Descriptors in Host(Should be power of 2). Default:1024,
    Min:64, Max:2048″
    * rdesc_1g : Receive Descriptors for 1G(Should be power of 2). Default:512,
    Min:64, Max:4096
    * rdesc_10g : Receive Descriptors for 10G(Should be power of 2). Default:1024,
    Min:64, Max:8192
    * jumbo_desc_1g : Jumbo Receive Descriptors for 1G (Should be power of 2).
    Default:64, Min:32, Max:512
    * jumbo_desc_10g : Jumbo Receive Descriptors for 10G (Should be power of 2).
    Default:128, Min:32, Max:1024
    * md_capture_mask : Capture mask for collection of firmware dump.
    Default:0x1F, Min:0×3, Max:0xFF(Valid values : 0×03, 0×07, 0x0F, 0x1F, 0x3F, 0x7F, 0xFF)
    * md_enable : Enable/disable firmware minidump support.
    Default:1, Enable=1 ,Disable=0
    * rx_mac_check : Enable/disable check of mac address/mac learning in the Rx path.
    Default:0, Enable=1 ,Disable=0

    Ex)
    esxcfg-module -s “rx_mac_check=1 lro_enable=0 multi_xtx=0 hw_vlan=0″ qlcnic

    duplication
    —–or—–

    or issue.
    firmware(Boot Code) v4.09.46(1.09.22) = EMC Storage latency issue resolved.
    link)
    http://driverdownloads.qlogic.com/QLogicDriverDownloads_UI/ResourceByOS.aspx?productid=1165&oemid=65&oemcatid=58604

    issue resolved.
    Good Luck!
    (@^__^@)

  6. So is there a conclusive resolve to this? We have observed similar FTP transfer slowness and failures, and VMs failing to communicate with eachother between ESX Hosts. Virtually the same setup, DELL M620 blades running ESXi 4.1 with QMD8252K utilising NPAR.

    One post you’ve made says you fixed it, the next says it’s not and then someone else is posting about it.. ?

    Cheers,
    Dave

    • Hello Dave,

      I will check about it next week and will inform you on this site and maybe send an email to you, because its time to use my 10Gs and i’m so late on it.

      See you next week
      Vahric

  7. Any Progress? So far we have observed the following issues:

    1. Intra-VLAN communication between VMs on different hosts failing to communicate with eachother, but no real consistency with the problem. It seems to be intermittent.

    2. We also use a Managed File Transfer application which fails to complete transfers or runs very very slowly when running via it’s proxy. Our web admin has reported a slight slowness on one website and and we have observed some slowness on Citrix (Netscalers).

    We are probably going to try altering settings you have detailed above but seeing if we can reproduce problems on other ESX hosts on another cluster before potentially logging this with DELL/VMWare. We’ve checked and we have the latest firmware and drivers for the QLogic cards and the latest firmware on the M8024K switches (which are stacked).

    Cheers,
    Dave

  8. BTW, driver 4.0.739 default HW LRO is actually set to Disable

  9. We are running ESXi 5.0u1 on the Dell R620 with two QME8242 adapters. The driver 4.0.739 and esxcfg-module -s “rx_mac_check=1″ qlcnic addressed the performance and connectivity issues. We continued to experience intermittent network links going down sometimes along with the FCoE and sometimes the FCoE would continue to function. The only way to reset them was to reboot the host. The physical network ports appeared active to the Cisco Nexus switches.

    The issue was resolved after working with Dell support. Support was concerned the adapters were too hot, so they had us set the iDrac Settings, Thermal, Fan Speed Offset from “Low Fan Speed Offset” to “High Fan Speed Offset”.

    http://www.dell.com/downloads/global/products/pedge/advanced_thermal_control_whitepaper.pdf

  10. Today we made some more test and yes disabling LRO is really working, now i’m updating Dell and maybe in future will contact VMware because such features should help to offload but disabling such feature do not help virtual world.

  11. Related firmware “01.10.47, A00” is solving the problem
    Also installed last drivers of qlcnic drivers from vmware site but i don’t think so its help
    We also tested LRO disabled and LRO enabled, both firmware everything is working well
    Hope its help everyone
    Thanks Gökhan for your help !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: