Editing Nopl

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
[[Category:Research]] {{DISPLAYTITLE:nopl}}
[[Category:Research]] {{DISPLAYTITLE:nopl}}
During research for my [[AMD Geode]] projects, I found an amazing saga based around a CPU instruction. Nobody else has written this up from what I can see, so here's my take.
During research for my [[Geode Repair]] projects, I found an amazing saga based around a CPU instruction. Nobody else has written this up from what I can see, so here's my take.


== Background ==
== Background ==
Line 32: Line 32:
In 1998 Christian Ludloff documented in his [https://web.archive.org/web/19981205142152/http://sandpile.org:80/80x86/opcodes2.shtml updated map of 2 byte x86 opcodes] that the 0F 18 through 0F 1F range of opcodes were hinting NOPs. The first being the 0F 18 opcode which maps to PREFETCHh instructions. I believe this information was documented first in the [https://www.cs.cmu.edu/afs/cs/academic/class/15213-s01/docs/intel-opt.pdf Intel Architecture Optimization Reference Manual].
In 1998 Christian Ludloff documented in his [https://web.archive.org/web/19981205142152/http://sandpile.org:80/80x86/opcodes2.shtml updated map of 2 byte x86 opcodes] that the 0F 18 through 0F 1F range of opcodes were hinting NOPs. The first being the 0F 18 opcode which maps to PREFETCHh instructions. I believe this information was documented first in the [https://www.cs.cmu.edu/afs/cs/academic/class/15213-s01/docs/intel-opt.pdf Intel Architecture Optimization Reference Manual].


Later in 2003 Christian Ludloff clarified in an email thread [https://web.archive.org/web/20041106070621/http://www.sandpile.org/post/msgs/20004129.htm Undocumented opcodes (HINT_NOP)] that these hinting NOPs were declared by Intel in their 1995 patent [https://patents.google.com/patent/US5701442A/en US5701442]. The idea behind this patent from my reading is that you can encode a program written in another ISA as a series of opcodes that are run as NOPs on older machines and the new ISA on a newer machine.
Later in 2003 Christian Ludloff clarified in an email thread [http://www.sandpile.org/post/msgs/20004129.htm Undocumented opcodes (HINT_NOP)] that these hinting NOPs were declared by Intel in their 1995 patent [https://patents.google.com/patent/US5701442A/en US5701442]. The idea behind this patent from my reading is that you can encode a program written in another ISA as a series of opcodes that are run as NOPs on older machines and the new ISA on a newer machine.


I'm not sure why, but third party x86 CPUs aside from AMD didn't implement these NOPs. Perhaps Intel kept this patent close to their heart? Or maybe it's just not worth spending silicon and research on NOPs that nobody used?
I'm not sure why, but third party x86 CPUs aside from AMD didn't implement these NOPs. Perhaps Intel kept this patent close to their heart? Or maybe it's just not worth spending silicon and research on NOPs that nobody used?
Line 52: Line 52:


== Linux fallout ==
== Linux fallout ==
In 2006 [https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1596541188b1a4080ab7bce6578c09626193dfd0 PATCH: Add "nop memory" for i386/x86-64] was committed to the GNU Assembler. It added support for the 'nopl' and 'nopw' assembly instructions that map to multi-byte NOP code.
In 2006 [https://sourceware.org/pipermail/binutils/2006-June/047692.html PATCH: Add "nop memory" for i386/x86-64] was committed to the GNU Assembler. It added support for the 'nopl' and 'nopw' assembly instructions that map to multi-byte NOP code.


In 2007 [http://lkml.iu.edu/hypermail/linux/kernel/0709.2/2726.html x86: multi-byte single instruction NOPs] was committed to Linux. This added a set of 'P6 NOPs' that used the multi-byte NOP opcodes and used them for i686 or newer x86 CPUs. Which type of NOPs to use were decided at runtime, so running an i686 kernel on an i586 machine would not cause any issues with this. Strangely on 64-bit systems the NOPs were only used if your CPU vendor was Intel.  
In 2007 [http://lkml.iu.edu/hypermail/linux/kernel/0709.2/2726.html x86: multi-byte single instruction NOPs] was committed to Linux. This added a set of 'P6 NOPs' that used the multi-byte NOP opcodes and used them for i686 or newer x86 CPUs. Which type of NOPs to use were decided at runtime, so running an i686 kernel on an i586 machine would not cause any issues with this. Strangely on 64-bit systems the NOPs were only used if your CPU vendor was Intel.  
Line 119: Line 119:


== glibc fallout ==
== glibc fallout ==
In 2010 [https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=01f1f5ee Pass -mtune=i686 to assembler when compiling for i686] was committed to glibc. This told GNU Assembler to optimize for i686 CPUs (Pentium Pro), and as I mentioned in the previous section, this used multi-byte NOPs.  
In 2010 [https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=01f1f5ee Pass -mtune=i686 to assembler when compiling for i686] was committed to glibc. This told GNU Assembler to not just generate code that worked on i686, but to tune for the i686 architecture. This


A month later the Arch Linux bug [https://bugs.archlinux.org/task/19733 Update to glibc 2.12-2 on VIA C3 Nehemia makes system unusable] and Fedora bug [https://bugzilla.redhat.com/show_bug.cgi?id=579838 glibc not compatible with AMD Geode LX] were reported. glibc being a core component of most GNU systems meant updating completely crashed people's machines. Oops.
In 2010 the Arch Linux bug [https://bugs.archlinux.org/task/19733 Update to glibc 2.12-2 on VIA C3 Nehemia makes system unusable] and Fedora bug [https://bugzilla.redhat.com/show_bug.cgi?id=579838 glibc not compatible with AMD Geode LX] were reported.
 
Unlike the Linux and GNU Assembler discussions, the Arch Linux and Fedora discussions were from the perspective of people building and packaging software. Finding out what was broken was a little tricky.
 
* Was it GNU Assembler for adding nopls to code?
* Was it glibc for tuning for i686 CPUs?
* Was it the Linux distros for running i686 binaries on non-i686 CPUs?
 
Things were a little tricky for Fedora here as they explicitly supported the AMD Geode LX800 as it was used in millions of laptops for the One Laptop per Child project. While the LX800 isn't i686, it ran i686 binaries fine. They would have to support not just i686 but i586 too for their entire distribution just to support this laptop.
 
Around this time the GNU Assembler committed [https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=2210942396dab942a86cb6777c705554b84ebb0e Don't generate multi-byte NOPs for i686.] This patch restricted generating multi-byte NOPs to Intel and AMD CPUs. Strangely enough the i586 AMD K6-2 CPU was marked as supporting multi-byte NOPs, which was fixed in the 2013 commit [https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=d56da83e58816c45a4bc70503776a6e62a66bf89 Remove CpuNop from CPU_K6_2_FLAGS].
 
After a few months of discussion and without a new GNU Assembler release, Arch and Fedora decided to just revert glibc's change. This at least fixed things and made i686 builds of their distributions run on CPUs they supported.
 
== Kernel emulation ==
In 2010 a kernel developer proposed the patch [https://lists.archive.carbon60.com/linux/kernel/1268554 AMD Geode NOPL emulation for kernel 2.6.36-rc2]. This patch would trap the unknown instruction exception non-i686 CPUs would generate, emulate it, then return back to the program. This was a bit controversial.
 
Arguments for the patch:
 
* Distributions aren't going to care long term
* Proprietary software isn't easily fixable
* A similar patch was used to emulate CMOV instructions on i586 CPUs
 
Arguments against the patch:
 
* NOP isn't supposed to spend thousands of CPU cycles jumping to the kernel and back
* With the GNU Assembler fix distributions can avoid adding multi-byte NOPs
* That patch wasn't accepted in to Linux
 
A bit later someone started the mailing list thread [http://lkml.iu.edu/hypermail/linux/kernel/1009.0/02825.html Promoting Crusoe and Geode Processors to i686 Status] which took a look at the overall situation for those two CPUs. It argued that both CPUs supported the full i686 instruction set and that NOPL was not standard i686. As far as I can tell not much was done in response to this.
 
In 2021 the patch [https://lkml.org/lkml/2021/6/26/132 x86: add NOPL and CMOV emulation] was proposed to the kernel again. As most 32-bit x86 distributions compiled for the i686 architecture this would let i586 or better CPUs run modern day 32-bit Linux distributions. This is especially useful for CPUs still manufactured and used today like Vortex86 CPUs. As it turns out, old machines don't just disappear. They just run out of date software.
 
Unfortunately a few days later [https://lkml.org/lkml/2021/6/29/687 the author followed up with some bad news]. The Pentium Pro introduced conditional floating point operations and when used on systems that don't support them they silently fail instead of throwing an unknown instruction exception. This makes it effectively impossible to fully emulate the i686 instructions on i586 systems.
 
== LLVM fallout ==
In 2010 [https://github.com/llvm/llvm-project/commit/c26ddccf3818ddcebc84e98b9310a2aa76692572 r96988] was committed to LLVM. It made the compiler unconditionally output multi-byte NOPs for 32-bit and 64-bit x86 code. This happened regardless if the target architecture supported it, so output could break on systems that weren't even supposed to support multi-byte NOPs, like i586 or i386.
 
In 2011 someone reported [https://lists.freebsd.org/pipermail/freebsd-current/2011-October/028588.html 9.0 RC1/Clang / illegal instruction (Signal 4) in gengtype while building cc_tools on i586.] to the FreeBSD mailing lists and in 2012 [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=168253 clang crashes on Geode] was reported to the FreeBSD bug tracker.
 
After the first bug report, [https://bugs.llvm.org/show_bug.cgi?id=11212 X86AsmBackend::WriteNopData uses long nops unconditionally] was filed upstream to LLVM.
 
Later in 2012 [https://github.com/llvm/llvm-project/commit/5dd4ccb4020173a569bc54ba559232b5be2cef01 LLVM r164132] was committed, adding a 'geode' CPU target to LLVM that didn't use multi-byte NOPs. This meant building for i686 without using multi-byte NOPs required building for Geode CPUs. Not very useful for generic i686 releases or for i586 and older machines that weren't supposed to support multi-byte NOPs.
 
In 2014 [https://github.com/llvm/llvm-project/commit/1b8bfdaae3264efdba964321956965a6ab47540a LLVM r195679] was committed to flat out avoid using multi-byte NOPs on i686, i586 and specific non-Intel and non-AMD CPU models that didn't support multi-byte NOPs.
 
== Emulators ==
It's not just hardware that implements the i686 instruction set, software emulators can too. So which emulators support multi-byte NOPs?
 
In 2006 [https://sourceforge.net/p/bochs/code/7216 Bochs r7216] was committed, adding support for the multi-byte NOP opcode as long as Bochs was compiled to emulate an i686 or newer. Later in 2007 [https://sourceforge.net/p/bochs/code/7973 Bochs r7973] was committed, marking 0F 19 through 0F 1E as multi-byte NOPs based on AMD documentation. They didn't link to the documentation but it makes sense to me.
 
In 2006 [https://git.qemu.org/?p=qemu.git;a=commitdiff;h=e17a36ce41bc76abeceb QEMU r2145] was committed and made all hinting NOPs execute as multi-byte NOPs. This made it in to QEMU 0.9.0 which makes the Debian bug report reporting QEMU 0.9.1 as crashing due to NOPs surprising. Furthermore these NOPs are available on every emulated x86 CPU, 32-bit or 64-bit, regardless of whether it should have it or not.
 
In 2007 [https://github.com/mirror/vbox/commit/cb39b37cad08c79c5096fcd5dd69ad6997ee418b VirtualBox r2422] imported QEMU's i386 interpreter and gained multi-byte NOP support.
 
In 2020 [https://github.com/sarah-walker-pcem/pcem/commit/b973755ca376dbb47c3a8c85a53f4058f0ccc54d Add hintable NOPs for Pentium Pro and II.] was committed to PCem.
 
In 2022 [https://github.com/joncampbell123/dosbox-x/pull/3390 src/cpu: Implement hinting NOPs] was merged to DOSBox-X, the only DOSBox variant that supports Pentium Pro and newer CPUs.
 
== Intel CET ==
In 2016 Intel announced [https://web.archive.org/web/20160614162220/http://blogs.intel.com/evangelists/2016/06/09/intel-release-new-technology-specifications-protect-rop-attacks/ Control-flow Enforcement Technology] and released the [https://web.archive.org/web/20170320213641/https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Intel CET specification]. These CPU extensions run not just in 64-bit mode but in 32-bit mode. While management for the shadow stack uses new instructions, the ENDBRANCH instruction intended to be compiled in to user space code re-uses the hinting NOP 0F 1E.
 
Unlike the multi-byte NOP there's no indication in the specifications that these instructions are limited to Pentium Pro or newer CPUs.
 
In 2017 [https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=603555e563725616246912711419637add54c961 Add support for Intel CET instructions] was committed to the GNU Assembler.
 
Later in 2017 [https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2a25448c490b16eea276521d818640bcaca75e35 Update x86 backend to enable Intel CET.] was committed to GNU GCC.
 
Even later in 2017 [https://github.com/llvm/llvm-project/commit/fec21ec0c6257eb24290c483b03b4fd9e6a9d0d1 LLVM r318995] added support for CET. As far as I can still this doesn't limit the use of these CET instructions.
 
In 2021 [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98667 gcc generates endbr32 invalid opcode on -march=i486] was reported to GCC. The next day [https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=77d372abec0fbf2cfe922e3140ee3410248f979e x86: Error on -fcf-protection with incompatible target] was committed to GNU GCC. This patch limits CET to architectures with CMOV. That's a safe bet, but seems like it would break on the Geode LX800 and other i686-compatibles that lack multi-byte NOPs.
 
In 2022 [https://github.com/rust-lang/rust/issues/93059 i586-unknown-linux-gnu target generates binaries containing Intel CET opcodes which are illegal on i586 processors] was reported to the Rust bug tracker. A day or so later Gentoo committed [https://github.com/gentoo/gentoo/commit/bff66eedb4ae530ef21187d617daeba5472320a1 dev-lang/rust: pass -fcf-protection=none on i586] despite Rust not being available on i586 yet. It's unclear how much things will break if someone gets an actual i686 build of Rust going.
 
Rust uses LLVM so this might indicate that LLVM doesn't check if an architecture supports CET before adding its instructions.
 
As of early 2022 Intel CET support is not in the kernel yet.
 
== Conclusions ==
I have a few takeaways from this slow motion train wreck:
 
* Intel's documentation only applies to Intel CPUs
* Developers don't really question retroactive additions to instruction sets
* To some i686 is the Pentium Pro
* To others i686 is a baseline for various 32-bit x86 processors
 
Something else to just tack on here is that I spent a non-trivial amount of time trying to dig up old copies of Intel web pages and documentation. By the way Intel: When you make a new revision of a document you don't have to destroy the old ones.
Please note that all contributions to JookWiki are considered to be released under the Creative Commons Zero (Public Domain) (see JookWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)