Why is MPLS Linux slower at forwarding packets then Linux's IPv4 stack?

There are some misconceptions out their regarding the speed of MPLS vs IPv4
packet processing ....

Back in the mid 90's the state-of-the-art in edge and core routing technology
was processor based packet forwarding.  At that same time the requirements
for how per packet forwarding decisions were being made was getting more
complicated.  Edge and core routers were being asked to consider source and
destination addresses, incoming and outgoing interfaces, as well as TCP/UDP
port numbers.  This forced router vendors to switch to some sort of "flow" or
"hash" based look up to determine the forwarding treatment (next hop and/or
queuing).  As any CS major knows both flow and hash based look up schemes can
suffer from high amounts of "key collisions" when 1000s of packet flows per
second are being considered.  This in essence change the look up depth from
32 bits to something greater then 32 bits depending on the technique and the
amount of "key collisions".  So per packet decisions making was becoming
a bottle neck in the core of the network.  Along came various "IP Switching"
techniques and "tag switching" all of which contributed to MPLS.  One of the
benefits of MPLS at that time was that the complex decision making for
forwarding treatment was done once before  at "LSP setup time" and per packet
processing would be a consistent 20 bit look up.  If the state-of-the-art
in packet forwarding has stood still, then MPLS would have been the savior
of core routing, but in the time it took for MPLS to become a standard the
world of packet forwarding was revolutionized by ASICs and FPGAs.  These
hardware based packet look up engines could do the complex look up required
by core and edge routers faster then the pipes could transport the packets.

So when people said "MPLS should be faster then IPv4 at packet processing"
they were not referring to standard destination based IPv4 forwarding, they
were talking about complex forwarding decision making.  Theoretically standard
IPv4 destination only processing has a worst case of 32 bits of look up and
MPLS has a constant 20 bits of look up, not enough of a difference to show
up in throughput tests.  So if your comparison of MPLS Linux forwarding
versus Linux IPv4 forward is only based on IPv4 destination look ups, you
should not expect to see a performance benefit (in fact MPLS Linux forces
all ILM keys into a 32 bit number, so it too is doing a 32 bit look up :-).
That in combination with the fact that MPLS Linux has not under gone any
sort of optimization and has enormous amount of debug/tracing code, while
the Linux IPv4 stack has undergone years of optimization by some of the
brightest minds in the world.  I'm surprised that MPLS Linux has performed
as well as it has in the tests results I've seen.