In this video, we take a look at a bug that was introduced by a Microsoft engineer into the Linux kernel that had to be patched at the eleventh hour before Linux 6.13 is released. Did they come to the rescue like John McClane and get it fixed in time? … yea, they did!
Chapters:
00:00 Intro
00:19 The intended functionality
00:54 What went wrong
01:33 alright now Microsoft
02:04 the plan
02:27 There’s something else though
02:48 What the Ack?
03:16 Why Open Source development fascinates me
04:26 Linux 6.13 on TWIL
View full transcript
Michael:
The next version of the Linux kernel is supposed to be coming this Sunday, January 19th. Well, a bug was found just a couple days ago that could cause some issues with that plan. It turns out that an engineer at Microsoft added some code to the kernel that is making a bit of a mess.
Hi, I’m Michael and I make videos about Open Source software and Linux so subscribe for more.
A couple of months ago, the merge window for Linux 6.13 saw a really interesting and very useful improvement to the kernel modules contributed by a Microsoft engineer. Not only was this interesting, some people said that this would be the biggest highlight for kernel modules in this release.
The code was for using large read-only execute pages or ROX pages for allocations and caching. This would reduce the instruction TLB pressure and has the potential of improving performance. Unfortunately there were some issues caused by this change and well, it had to be patched.
Peter Zijlstra of Intel submitted a patch saying and I quote, “The whole module writable address nonsense made a giant mess of alternative.c. Not to mention it still contains bugs, notably some of the CFI variants crash and burn.”
For those unfamiliar, CFI stands for Control Flow Integrity and CFI is an anti-malware tech aimed at preventing attackers from redirecting the control flow of a program. So obviously having issues with this anti-malware stuff is not great. Now this bug doesn’t affect all systems but it does cause issues on some CFI enabled setups including reports of machines powered by Intel Alder Lake failing to resume from hibernation.
I know a lot of people are going to take this opportunity to yell at Microsoft for this and let’s face it Microsoft is not known for having the highest or the utmost standards for quality control. I mean after all they do make Windows… BUT, sometimes things happen especially when you are working on a very complex project, sometimes things don’t go as planned. The Microsoft engineer who submitted the changes has been doing Linux kernel development since at least 2006 well before starting at Microsoft so this is not a knock on him either. In fact, he’s been working on patches to clean it all up.
Zijlstra said “but given the current state of things, this just isn’t ready.” And he says that they’re going to be disabling it for now, but they’ll try again in the next cycle. So that means the code is still going to be in the kernel source, but it’s not going to be actually included in the stable kernel build. It’s a shame because this would provide some significant performance benefits but hopefully we’ll get those in 6.14 once this is taken care.
There is another oddity with this though, AMD engineer, Borislav Petkov noted that the Linux x86_64 maintainers had not signed off on the change, which is a bit weird. How did it even get in if no one signed off on it? There is a discussion to be had about disabling the code versus reverting it but I think the more important discussion would be to figure out how it got in without anyone giving an Ack
. . . oh yea, for those who don’t know, Ack means Acknowledgement which is just a way of saying you sign off on some code. When I first heard someone say this … what? … the discussion was normal dev talk and then all of the sudden one of them said “hey can you ack me” … what? …
This is why open source development especially development on the Linux kernel is so fascinating to me. This news sounds terrible and I already saw some people throw shade at Microsoft for it but at the same time, it’s also an example of why Open Source is great because you can see all the people from various companies working together to find and solve issues. This could have been a mess but it was found and addressed before release so it’s basically a nothing burger but on the flip side the openness of this brings these kinds of things to the limelight where in proprietary development, no one would have ever heard about this happening but at the same time you could argue that if it wasn’t open, no one would have found the issue and been able to address it as fast as it was. So yea, that’s why this stuff is so interesting to me.
Anyway, it’s basically a nothing burger and I hope it’s all addressed for 6.14 because the performance improvements are quite promising. And there’s still a lot of stuff to look forward to with 6.13. I’ll be covering that in the next episode of This Week in Linux. If you haven’t subscribed then do that now because it’s one of the best Linux shows ever made and just because I happen to be the researcher, writer, host, editor, and well creator of the show, that doesn’t mean I’m biased. You’ll like it.
In fact, you don’t have to wait for the next episode, here’s a playlist to check out all the latest episodes of This Week in Linux so if you’re new to my content, don’t wait, jump on that TWIL ride. . . And you too, can become a TWILLER … lol facepalm
Start the discussion at forum.tuxdigital.com