Ever wondered how many “raw” devices you can create on a Linux server?

I have to admit this isn’t a question that crops up all that often, but given what I have been through over recent months, I thought I’d share with the world.

Lets start from the beginning…

What are raw devices?

Raw devices, are used when there is a need to present underlying storage devices as character devices, which will consume be consumed typically by RDBMS.

Some examples of database technologies where “raws” might be used would be;

  • SAP IQ
  • SAP Replication Server
  • MySQL

The list is not limited to the above, but they are the ones I know about.

Why uses raw devices?

Well, there is a belief (and probably some, maybe lots of evidence) that file systems introduce an extra overhead when processing I/O.  Though I would question whether in the day and age of Enterprise SSD and NVMe based storage solutions, whether it is anywhere near as relevant as it used to be??

So the idea is simple.  Present a raw character device (or multiples of)  to the database technology being used and allow it to handle the file system type tasks and thereby remove the (perceived??) overhead of a file system.

For me I can only think this would be really relevant in particular high, low latency environments, and even then I would be keen to see some base line figures to confirm if the extra admin overhead raw devices brings is worth it.

Hang on! You enticed me in with how many raw devices can you create?  So tell me!!

OK,  the greatest number of raw devices you can created on a Red Hat Enterprise Linux/Cent OS 7 system is… 8192.

That’s 0 all the way up to 8191.

Would I really ever need that many raw devices?

Well, maybe.  I doubt you would ever really create 8192 raw devies, as this would require you to effectively provision the same number of storage LUNs.

So how did I stumble across this (potentially unimportant) fact?

Well, whilst working on the requirements from my local DBA team,  I was also attempting to introduce a level of standardisation, so it was easy to see what the raw devices were used for.

Raw devices are numbered and it doesn’t look like you can use alphabetical characters in the name.  So a number standard had to be created.  For example;

raw devices numbered 1100-1150 might be the raws used to store the actual data, 1300-1320 be contain the logs and 1500-1510 might be for temp tables.

So if you also include a bit of future proofing and have a large enough number of storage LUNs to provision for use directly by the RDBMS then you could quickly find yourself constrained if you don’t plan ahead.

Anyway, the above was found out because I started to get strange problems when trying to create udev rules which would out create the raw devices, so I had a fun our of trying to work out the magic number.

For raw devices (if not much else) is 8191 (the maximum raw number you can use).

iSCSI and Jumbo Frames

I’ve recently been working on a project to deploy a couple of Pure Storage Flash Array //M10‘s, and rather than using Fiber Channel we opted for the 10Gb Ethernet (admittedly for reasons of cost) and using iSCSI as the transport mechanism.

Whenever you read up on iSCSI (and NFS for that matter) there inevitably ends up being a discussion around the MTU size.  MY thinking here is that if your network has sufficient bandwidth to handle the Jumbo Frames and large MTU sizes, then it should be done.

Now I’m not going to ramble on about enabling Jumbo Frames exactly, but I am going to focus on the MTU size.

What is MTU?

MTU stands for Message Transport Unit.  It defines the maximum size of a network frame that you can send in a single data transmission across the network.  The default MTU size is 1500.  Whether that be Red Hat Enterprise Linux, , Fedora, Slackware, Ubuntu, Microsoft Windows (pick a version), Cisco IOS and Juniper’s JunOS it has in my experience always been 1500 (though that’s not to say that some specialist providers may change this default value for black box solutions.

So what is a Jumbo Frame?

The internet is pretty much unified on the idea that any packet or frame which is above the 1500 byte default, can be considered a jumbo frame.  Typically you would want to enable this for specific needs such as NFS and iSCSI and the bandwidth is at least 1Gbps or better 10Gbps.

MTU sizing

A lot of what I had ready in the early days about this topic suggests that you should set the MTU to 9000 bytes, so what should you be mindful of when doing so?

Well, lets take an example, you have a requirement where you need to enable jumbo frames and you have set an MTU size of 9000 across your entire environment;

  • virtual machine interfaces
  • physical network interfaces
  • fabric interconnects
  • and core switches

So you enable an MTU of 9000 everywhere, and you then test your shiny new jumbo frame enabled network by way of a large ping;


$ ping -s 9000 -M do


> ping -l 9000 -f -t

Both of the above perform the same job.  They will attempt to send an ICMP ping;

  • To our chosen destination –
  • With a packet size of 9000 bytes (option -l 9000 or -s 9000), remember the default is 1500 so this is definitely a Jumbo packet
  • Where the request is not fragmented, thus ensuring that a packet of such a size can actually reach the intended destination without being reduced

The key to the above examples is the “-f” (Windows) and “-M do” (Linux).  This will enforce the requirement that the packet can be sent from your server/workstation to its intended destination without the size of the packet being messed with aka fragmented (as that would negate the whole point of using jumbo frames).

If you do not receive a normal ping response back which states its size as being 9000 then something is not configured correctly.

The error might look like the following;

ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

The above error is highlighting the fact that we are attempting to send a packet which is bigger than the local NIC is configured to handle.  It is telling us the MTU is set at 1500 bytes.  In this instance we would need to reconfigure our network card to handle the jumbo sized packets.

Now lets take a look at what happens with the ICMP ping request and it’s size.  As a test I have pinged the localhost interface on my machine and I get the following;

[toby@testbox ~]$ ping -s 9000 -M do localhost
PING localhost(localhost (::1)) 9000 data bytes
9008 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.142 ms
9008 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.148 ms
9008 bytes from localhost (::1): icmp_seq=3 ttl=64 time=0.145 ms
--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2085ms
rtt min/avg/max/mdev = 0.142/0.145/0.148/0.002 ms

Firstly notice the size of each request.  The initial request may have been 9000 however that doesn’t take into account the need for the header to be added to the packet, so that it can be correctly sent over your network or the Internet.  Secondly notice that the packet was received without any fragmentation (note I used the “-M do” option to ensure fragmentation couldn’t take place).  In this instance the loopback interface is configured with a massive MTU of 65536 bytes and so all worked swimmingly.

Note that the final packet size is actually 9008 bytes.

The packet size increased by 8 bytes due to the addition of the ICMP header mentioned above, making the total 9008 bytes.

My example above stated that the MTU had been set to 9000 on ALL devices.  In this instance the packets will never get to their intended destination without being fragmented as 9008 bytes is bigger than 9000 bytes (stating the obvious I know).

The resolution

The intermediary devices (routers, bridges, switches and firewalls) will need an MTU size that is bigger than 9000 and be size sufficiently to accept the desired packet size.  A standard ethernet frame (according to Cisco) would require an additional 18 bytes on top of the 9000 for the payload.  And it would be wise to actually specify a bit higher.  So, an MTU size of 9216 bytes would be better as it would allow enough headroom for everything to pass through nicely.

Focusing on the available options in a Windows world

And here is the real reason for this post.  Microsoft with all their wisdom, provide you with a drop down box to select the required predefined MTU size for your NICs.  With Windows 2012 R2 (possibly slightly earlier versions too), the nearest size you can set via the network card configuration GUI is 9014.  This would result in the packet being fragmented or in the case of iSCSI it would potentially result in very poor performance.  The MTU 9014 isn’t going to work if the rest of the network or the destination device are set at 9000.

The lesson here is make sure that both source and destination machines have an MTU of equal size and that anything in between must be able to support a higher MTU size than 9000.  And given that Microsoft have hardcoded the GUI with a specific number of options, you will probably want to configure your environment to handle this slightly higher size.

Note.  1Gbps Ethernet only supported a maximum MTU size of 9000, so although Jumbo Frames can be enabled you would need to reduce the MTU size slightly on the source and destination servers, with everything in between set at 9000.

Featured image credit; TaylorHerring.  As bike frames go, the Penny Farthing could well be considered to have a jumbo frame.

Enabling Broadcom FCoE

Following on from my previous post on the topic on BL460c Gen8 and the missing HBA, I thought I’d provide a quick script to automate the required installation of rpms and associated config changes, so here you go;


echo "Installing Pre-Reqs for FCoE on Broadcom NICs…"

yum install -y lldpad lldpad-libs fcoe-utils

echo "Configuring FCoE…"

cd /etc/fcoe

DEVICES="eth2 eth3"

for i in $DEVICES; do
cp cfg-ethx cfg-$i
sed -i ‘s/DCB_REQUIRED="yes"/DCB_REQUIRED="no"/g’ cfg-$i

echo "Starting lldpad service…"

/sbin/chkconfig lldpad on
/sbin/service lldpad start

for i in $DEVICES; do
lldptool set-lldp -i $i adminStatus=disabled

ADMINSETCOUNT=`grep adminStatus /var/lib/lldpad/lldpad.conf | wc -l`

if [ $ADMINSETCOUNT == $COUNT ]; then
echo "adminStatus appears to have been set correctly"
echo adminStatus has not been set correctly, you should investigate this further!"
echo "Useful URL – http://h20564.www2.hp.com/hpsc/doc/public/display?docId=mmr_kc-0115755-3"

echo "Restarting services after changes made"
/sbin/service lldpad restart
/sbin/service fcoe restart

echo "Setting FCoE service to start on boot…"
/sbin/chkconfig fcoe on

echo "Enable Ethernet interfaces to start on boot"

for i in $DEVICES; do
sed -i ‘s/ONBOOT=no/ONBOOT=yes/g’ ifcfg-$i


The above does make the assumption that you are using eth2 and eth3 for FCoE.

HP BL460c Gen8 and the missing HBA

Let me begin my tale of woe by setting the scene appropriately;

  • 4x BL460cGen8
    • 2x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (10 Cores)
    • 128GB RAM
    • HPFlexFabric10Gb2-Port534FLB Adapter
      • in iLO (integrated Lights Out)
  • 4x c7000 enclosure (some platinum, some not)
    • OA (Onboard Administrator) firmware version: 4.30
    • VC (Virtual Connect) firmware version: 4.30
      • A server profile has been created for each server with the Ethernet connectivity and FCoE)
      • The server profile has also been assigned to the required bay (just putting this in so there is no doubt, it is set-up as it should be)
  • RHEL (Red Hat Enterprise Linux) 6.5 and 6.6 were tested
  • HP SPP (Service Pack for ProLiant) 2014.09.0(B) installed across the board and the RHEL 6.6 Supplemental for the SPP have been applied where required
    UPDATE – HP have release SPP 2015.04.0.

The tale of woe!

I had a new requirement to set-up 4 new servers which would run RHEL 6.6 and would very much rely upon SAN based storage.

Installing RHEL was the easy part, getting the server to see it had access to FCoE (Fibre Channel of Ethernet) was more tricky, but only because there is a marked difference between the implementation of FCoE when using a Broadcom CNA (Converged Network Adapter) vs Emulex or Qlogic.

Things I checked

  • lspci – Didn’t show anything Fibre related.
  • dmesg – Equally didn’t show anything HBA or SAN related.
  • looking in the /sys/class directory structure showed no signs of the fc_host I am accustomed too.
  • A quick reboot a trawl through the BIOS (in HP speak RBSU) also showed no signs of anything SAN related.
  • Googled the issue to death
  • Confirmed that the CNA definitely was SAN compatible (you never know)
  • Logged onto the SAN switches to confirm there was connectivity between the Virtual Connect modules and the switches (it was but it showed that no blades had attempted to log into the fabric)
  • In the end I logged a ticket with HP.

The solution

So… After much head banging, and three hours describing the issue to HP first line technical bods, plus submitting (what felt like in triplicate) numerous logs from OA, VC and the server, I was finally pointed to the following URLs;

Now the above guides are all well and good assuming you haven’t followed the advice by HP tech support to upgrade to the latest firmware/drivers.  If you have done exactly as they have asked (not the you usually have much choice in the matter), then the following guide proved more useful;

One word of warning.  In the first article (Broadcom-Based CNAs…) above, when you get to step 6, It states you should run the following command;

[bash]# lldptool set-lldp-i ethY adminStatus=disabled[/bash]

There is a typo and the line should be entered at the command line (as root) as follows;

[bash][/bash]# lldptool set-lldp -i ethY adminStatus=disabled[/bash

With the above you will need to replace ethY with the actual interface name.

Also:  Its not made crystal clear but you will need to make sure that you update the ifcfg-ethY files for the NICs that are using the FCoE personality so that they are brought online at boot.

Going forward

For me, I have created a script to recreate this, so should we purchase servers with this card installed again, the set-up remains consistent and documented.  That said it is still very much a manual step to enable the FCoE personality on the card.