Mac OS X on VMware ESXi: ATI Radeon passthrough

Lately I've been quite involved into a virtualization project: running Mac OS X and Windows as workstations on top of VMware "bare-metal" hypervisor ESXi on my Mac Pro. It requires a good knowledge of virtualization and VMware products like ESXi and VSphere, serious sysadmin skills, and lots of perseverance.

I've finally managed to boot a Mac OS X 10.6.8 virtual machine on top of ESXi, on my Mac Pro with a proper ammount of RAM (12 GB), and the graphics card in passthrough mode. That required a manual tweak of the vmx file.
The VM wouldn't boot when configured with both the graphics card in passthrough and more than 2 GB RAM. I've had to add into the vmx file those two lines:

pciHole.start = "1200"
pciHole.end = "2200"

Then I was able to boot my Mac OS X VM with 12 GB RAM and the graphics card in passthrough. Great. I'm still lacking passthrough for USB keyboard and mouse, meaning I need a remote computer with VSphere Client to control my VM using the embedded console. But the VM uses the physical ATI Radeon, and the physical screen, and in theory it could use full GPU power.

It looks like things are working OK, but it'll take time and many more tests to make sure everything is really working. For example, I was not able to launch 3D FPS games like Left 4 Dead and Left 4 Dead 2 into the VM. The game would crash on launch.

Related posts

Fix a stuck Steam client on Mac OS X

From time to time, the startup of my Steam client on Mac OS X (10.6.8) is incredibly slow. And sometimes, it won't even launch successfully, getting stuck with a Beach Ball of Death.
A quick diagnostic comes from the powerful utility dtruss:

$ sudo dtruss -p <PID of steam process>
...
__semwait_signal(0x14D03, 0x4D03, 0x1)		 = -1 Err#60
__semwait_signal(0x17C03, 0x3F03, 0x1)		 = -1 Err#60
__semwait_signal(0xC03, 0x0, 0x1)		 = -1 Err#60
semop(0x2000F, 0xB5464C98, 0x1)		 = -1 Err#35
__semwait_signal(0xC03, 0x0, 0x1)		 = -1 Err#60
__semwait_signal(0x4D03, 0x14D03, 0x1)		 = -1 Err#60
...

If you read a LOT of errors on __semwait_signal and semop lines, you can fix your client quite easily. I must say, it might have some side effects, but I've never seen any.
First, kill the Steam client (right-click on it's icon in the Dock, choose "Force Quit"), then list semaphores:

$ ipcs -s
IPC status from <running system> as of Fri Nov 30 21:28:29 CET 2012
T     ID     KEY        MODE       OWNER    GROUP
Semaphores:
s 131072 0xe93c17d9 --ra-------   patpro   patpro
s 131073 0xc0ec4f17 --ra-ra-ra-   patpro   patpro
s 196610 0xb9e1e4e1 --ra-ra-ra-   patpro   patpro
s 131075 0x697a55e6 --ra-ra-ra-   patpro   patpro
s 131076 0x2e726ce1 --ra-ra-ra-   patpro   patpro
s 196613 0xa9ae61d6 --ra-ra-ra-   patpro   patpro
s 131078 0x1a661f70 --ra-------   patpro   patpro
s 196615 0x36dbd757 --ra-------   patpro   patpro
s 196616 0x44433b26 --ra-ra-ra-   patpro   patpro
s 196617 0x3cea9ea0 --ra-ra-ra-   patpro   patpro
s 196618 0xec712fa7 --ra-ra-ra-   patpro   patpro

If your steam client is not running and you read a full list of semaphores, you might want to remove them:

$ for SEM in $(ipcs -s | awk '/^s / {print $2}'); do ipcrm -s $SEM; done

Then, your Steam client should launch faster (well, at a normal speed), and it shouldn't get stuck.
Use at your own risks.

Related posts

Escaping the Apple ecosystem: part 0

Years from now, I've understood that the evolution of the Apple ecosystem would eventually become an ethic and practical problem for me and for other longtime Mac OS X power users. I will not give details here, but it's closely related to constant patent fights, proprietary appstores and their underlying business model, lack of professional hardware and software, lack of openness, iOS convergence, etc.
The fact is, I'm deeply addicted to Mac OS X (10.6 branch). It sports great functionalities, from great APIs, and is very nice to use (unless you are a linux control freak upset by the quite limited Window Manager of Mac OS X).
From my standpoint it has critical functionalities that I would miss a lot on other OSes, some of them I use more than ten times a day. For example, the ⌘-k combo that allows me to connect to WebDAV, AFP, NFS, and CIFS shares, to VNC servers… So convenient, so irreplaceable. I could find dozens different little (or big) things that make me stick with Mac OS X. Some of them are quite huge: it's not Windows, it runs my software (Adobe's CS5, Valve's L4D2, my beloved text editor BBEdit and many more), it's UNIX under the hood - and I must admit, the hood is constantly open.
I do understand of course that Apple is right about consumer products, they have a very good business model, and the recent rumor about a switch to ARM's CPU makes so much sense. But I'm no regular consumer. I spend 10 to 18 hours a day in front of various computers, have neither smart phone nor facebook account.
Unless I'm ready to give up much control to Apple, there's no way I move to iOS OSX 10.8 / 10.9. That's why I'm studying a path to escape the Apple ecosystem. I want it to be a slow process, allowing me to progressively switch from Mac OS X to something else.

Step zero

I think about step 0 as the core of my project, as the main idea. And that would be to keep on using Mac OS X. Huge step, uh? No kidding.
I'm currently evaluating various solutions that would allow me to:

  • keep Mac OS X as my main OS
  • use Windows, FreeBSD, linuxes as alternative OSes at the same time
  • enjoy full hardware power (use full GPU power, not emulated, not virtualized)
  • never reboot (I'm used to 15-50 days uptime, rebooting to change OSes is not an option)

Hence, I would be able to slowly switch my habits from Mac OS X to other OSes. Some softwares I need (Adobe's for example) will stay on Mac OS X until I buy a new version running on another OS, or until I find a nice alternative. That will take time.

In theory my step 0 would require at least:

  • A bare metal hypervisor running on my Mac Pro
  • a Mac OS X virtual machine capable of using natively USB ports, GPU, and SATA
  • a Windows virtual machine capable of using natively USB ports and GPU
  • optional: a linux VM capable of using natively USB ports and GPU

I've got plenty of power (2.8 GHz quad core Xeon), and plenty of RAM (24 GB). But believe it or not, that's not enough. For example, it's not possible to share the GPU between two active virtual machines. If you want direct I/O ("passthru" mode) for your graphics card, you need one card per active VM. That's not a real problem, My Mac Pro has few empty PCI slots, I can buy a fanless ATI Radeon or two as dedicated graphics cards for VMs.

As far as I know, VMware ESXi 5.1 is the only bare metal hypervisor supporting Mac Pro model 5,1 (mine). And VMware is also the only one supporting Mac OS X 10.6 (server) virtual machine on top of bare metal hypervisor (on top of Apple hardware). Fine.
VMware has "DirectPath I/O" that allows direct bind of a PCI device into a VM. For example, you can create a virtual machine and make it use the PCI GPU directly, so that you would have proper video power inside your VM.
Unfortunately this passthrough mode has severe limitations. Biggest ones are:

  • You can no longer create snapshot of your VM
  • You can no longer suspend your VM

I've installed VMware ESXi 5.1 on a spare SATA hdd into my Mac Pro. I've created a Windows virtual machine, and hooked the PCI graphics card on this VM using DirectPath I/O. It worked great. As soon as I've installed ATI drivers, the display plugged on the Mac Pro was reclaimed by the Windows VM. I was not able to dedicate USB ports to the VM, so for now it's kind of useless. Trying to launch a Valve game (Half Life Lost Coast) eventually fails. Probably a software configuration problem.
The same setup for a Mac OS X VM won't boot. I'm able to boot a Mac OS X virtual machine, using my very own SATA hdds in raw device mapping (ie. it boots my regular Mac Pro OS, from its physical hdds, as a VM). But if I try to use the ATI card via DirectPath I/O, it won't boot.

I'm stuck at step zero, with important questions asked, and not answered:

Is it possible to use GPU card passthrough with a Mac OS X guest? (also here)
Is it possible to configure USB passthrough on a Mac Pro with ESXi 5.1?

Feel free to help (and to correct my english). I really need those issues to be solved in order to move forward with this project.

helpful links:
fan speed of the ATI card with ESXi 5.1
Raw Device Mapping of local SATA disks on ESXi
ESXi as a Desktop with VMDirectPath I/O

Related posts

Spamhaus’ ZEN blacklist efficiency

At work, I'm using Spamhaus' Zen blacklist for many years now. For a huge organization the amount of daily checks makes it impossible to rely on the free Spamhaus service. So we pay for a local copy of the blacklist, rsynched every 20 minutes. It allows faster check too. When you use Spamhaus' blacklists as a paid service, the question is: how can you rate your return on investment? In an attempt to answer this question, I've gathered 2 years and 9 months worth of mail server log files (12 GB bzip2) and extracted some data.

I use greylist, blacklist, whitelist, antispam/antivirus, and recipient-based filtering. So it make things quite complicated when I need clear statistics about what is going on. The MX server accepts around 1.3 million messages a month for internal delivery, but many more come knocking at the door.
The main purpose of blacklisting is to limit the amount of emails going thru expensive filters. Greylist and blacklist are cheap filters: they are fast, and they cost very few CPU, memory, and network resources. In comparison, antispam and antivirus filters are expensive: they are slow, and have a huge CPU, memory and network usage. I do before-queue content filtering. It means the MX server will scan for spam and virus before the email is accepted. So all the filtering process must take place during the SMTP session, and that's a pretty hard thing to do. To make sure that spam and virus filters are available for fast analysis, you must block as much as bad emails with cheap (and fast) filters.

So here is the deal: use of paid RBL (Spamhaus' Zen, here) is only relevant if expensive filters cannot cope with the traffic. Below, the gnuplot output for MX log files between 20100101 and 20120928. It shows in red the number of hits in zen RBL, in green the number of emails coming out of Amavisd-new as "clean", and in blue the number of spam blocked by Amavisd-new.

2 years and 9 months of data: daily spam count in blue ("Blocked Spam" in amavisd-new logs), daily clean count ("Passed") in green, daily blacklist hits in red (blocked using zen.dnsbl-local in Postfix logs).

It shows, mainly, that hits in the blacklist have plummeted, when levels of spam and clean emails out of Amavisd-new are fairly constant. So while the blacklist is blocking at least 5 times less incoming SMTP transactions, the amount of emails reaching the antispam does not change. Spamhaus' zen blacklist efficiency is good (no increase in spam detection), but becomes less useful every day.

Below, the gnuplot output for the same time period showing the number of lines in Postfix logs :

total number of lines in postfix logs, daily basis

total number of lines in postfix logs, daily basis

It's good enough to stand for the evolution of the number of SMTP sessions and it's very similar to the curve of blacklist hits. Then, we can conclude that an external factor is responsible for the drop in incoming unwanted SMTP transactions. The war on botnets really took off in 2010, so may be we have an explanation here.

Lets go back to the main idea of this post: does zen blacklist worth its yearly fee? Based on my experience, yes it does, or at least, it did. From 2008 to 2011, it was clearly an asset. Before-queue content filtering would have been absolutely unusable. Reducing dramatically the load on the antivirus and antispam, zen RBL allowed me to handle about 1,300,000 clean email messages per month on a single MX server with b-qcf (on a 6 years old Apple XServe).
On 2012, zen blacklist still blocks a good daily amount of spam before it reaches the real antispam. But it's clear that if the trend continues, I will not renew the Spamhaus subscription on september 2013. At this rate, usefulness of this blacklist will not worth its cost by the end of 2012. Now that big vendors like Microsoft have embraced the war on botnets, I'm pretty confident that I won't need zen RBL any longer.

Related posts

L4D2: hit a key to stop team kill

L4D2 is a very nice game: loads of zombies and weapons in coop or versus challenges, nice maps… But with every popular game around, one day or another you'll find yourself playing with assholes. This kind of players that join your game in the middle of a map, shoot everybody, kill themselves, and leave, forcing you to replay the map. Of course it's always in expert mode, the harder one, and you can't do anything because it's so fast to kill your character when playing expert. It happened to me 3 or 4 times in just 2 days, always in expert mode of course. So I designed a key binding that helps stoping "team kill", giving you some time to ban and/or kick the killer without ruining the game.
It requires a server running Source Mod, and that you have the authorization to send sm_cvar commands to this server.

Locate your autoexec.cfg file somewhere in left 4 dead 2/left4dead2/cfg/, or create one if it does not exist yet. Open it in your favorite text editor, and paste the following code:

bind "o" "sm_cvar mp_friendlyfire 1;wait;sm_cvar survivor_friendly_fire_factor_expert 0.5;wait;sm_cvar survivor_friendly_fire_factor_hard 0.3;wait;sm_cvar survivor_friendly_fire_factor_normal 0.1"
bind "p" "sm_cvar mp_friendlyfire 0;wait;sm_cvar survivor_friendly_fire_factor_expert 0;wait;sm_cvar survivor_friendly_fire_factor_hard 0;wait;sm_cvar survivor_friendly_fire_factor_normal 0"

Change "o" and "p" by the letters you want to use. In this example, pressing "p" key will turn off every friendly fire damages, the "o" key will turn them back on.

When a team killer joins your game and starts shooting your teammates (and you), you might be fast enough to press the "p" key, rendering the killer mostly harmless, before he makes too much damages. Feel free to ban/kick the f*cker.

Enjoy.

Related posts

A script to list service ACLs on Mac OS X 10.5

I personally don't think it's a good thing to blog in english when you're french, unless you are very fluent and your target audience reads english. Today, my audience is the worldwide crowd of Mac OS X Server sysadmin. So, while I'm not fluent, I'm going to write my first post in english.

Background

There is something quite messy in the Service Access Control Lists (SACLs) on Mac OS X 10.5: you just can't display the full users & groups list of a SACL in command line.
Basically, you can do this:

$ dscl . -read /Groups/com.apple.access_ssh
AppleMetaNodeLocation: /Local/Default
GeneratedUID: A7E16606-3C52-42B9-852E-D197C7598EA8
NestedGroups: 955F946A-7C9D-4D3E-B286-E16003380282 ABCDEFAB-CDEF-ABCD-EFAB-CD...
PrimaryGroupID: 101
RealName:
 Remote Login Group
RecordName: com.apple.access_ssh
RecordType: dsRecTypeStandard:Groups

As you can see, this SACL group com.apple.access_ssh has no direct members, only nested groups (NestedGroups key). So, in order to list users, you have to read the content of each nested group. But groups are only available by their name. So the first step is to find out group's names.
At this stage, you have no way to know if the target group is local or if it sits on a remote open directory server, so you must use the /Search path:

$ dscl /Search -search /Groups GeneratedUID 955F946A-7C9D-4D3E-B286-E16003380282
myadmins		GeneratedUID = (
    "955F946A-7C9D-4D3E-B286-E16003380282"
)

The second step is to list users of the group:

$ dscl /Search -read /Groups/myadmins GroupMembership
GroupMembership: admin01 admin02 user01 user02 ldapuser01

But guess what: this group might have more than just users, may be its NestedGroups key is not empty! So at this point, you must also check the NestedGroups value, and recursively follow each group GUID, until you find only users.
Think "huge groups", think "handfulls of nested groups", and watch your fingers as you're going thru dscl torments. You've figured it out: Mac OS X lacks a good command line tool for following a SACL tree of users and groups.

Here come's getsacls.sh

I won't promise you a killer command line tool with foolproof error and recursion handling, but I still believe I've designed a usable piece of shell script. Even if it looks like it's the worst code I've ever wrote (wich is not true, I've made things way uglier).
The source code is too long and messy to be just copy-pasted here, just follow this link to download the getsacls.sh script.

How to get getsacls.sh:
Just download the latest version from here.

How to install getsacls.sh:
Simply copy to your Mac OS X 10.5 server (or managed client). Somewhere in your $PATH should be fine. Then chmod +x the script, so that it can be executed.

How to configure getsacls.sh:
Defaults values should be ok, but if you really want to change something, open the script in your favorite editor, and find the "FEW USER TUNABLE MISCS" section. Edit at your own risks.

How to use getsacls.sh:
It's simple, you just have to launch it. It will then proceed with the parsing of every SACL on your local system.
DO NOT use the sh command to launch this script. getsacls.sh uses special escape sequences and command options that sh will not recognize. Just run:

$ getsacls.sh

If you want to parse only some SACLs, you can provide each SACL name at the command line:

$ getsacls.sh com.apple.access_ssh com.apple.access_loginwindow

Still, you should only use SACL names that exist on your local system.

The default output is "fancy", it uses bold, indentation, and a beach-ball cursor. If you want the "no fancy" mode, you can either edit the corresponding "tunable misc variable" or define FANCY=NO at launch time:

$ FANCY=NO getsacls.sh com.apple.access_ssh

This "no fancy" mode allows for later parsing.

Caveats/bug:
The script will not handle circular references. If your SACL uses nested groups in a circular way (group 1 -> group 2 -> group 1), the script will not stop.
When finding two or more similar users or groups (for example the local admin group and the open directory admin group), it will use only one of them, and that should be the local one.
The script uses SQLite3 as a backend, because bash is not good with arrays, and because I'm not good with PERL/Python/Ruby.

Sample "fancy" output:

com.apple.access_ssh
--------------------------------
   myadmins	/LDAPv3/192.168.128.34	955F946A-7C9D-4D3E-B286-...
     admin01	/Local/Default	9A7917D1-D8E7-49D6-8211-...
     admin02	/Local/Default	40D516A2-4D02-4C92-9505-...
     ldapuser01	/LDAPv3/ldap.example.com	ldapuser01_OUT_OF_OD
     ldapuser02	/LDAPv3/ldap.example.com	ldapuser02_OUT_OF_OD
     ldapuser03	/LDAPv3/ldap.example.com	ldapuser03_OUT_OF_OD
     user01	/LDAPv3/192.168.128.34	49EF9C64-D98B-11D8-BCFA-...
   admin	/Local/Default	ABCDEFAB-CDEF-ABCD-EFAB-...
     root	/Local/Default	FFFFEEEE-DDDD-CCCC-BBBB-...
     admin01	/Local/Default	9A7917D1-D8E7-49D6-8211-...
     admin02	/Local/Default	40D516A2-4D02-4C92-9505-...
     user01	/LDAPv3/192.168.128.34	49EF9C64-D98B-11D8-BCFA-...
================================

Sample "no fancy" output:

com.apple.access_ssh
--------------------------------
g 1 myadmins /LDAPv3/192.168.128.34 955F946A-7C9D-4D3E-B286-...
u 2 admin01 /Local/Default 9A7917D1-D8E7-49D6-8211-...
u 2 admin02 /Local/Default 40D516A2-4D02-4C92-9505-...
u 2 ldapuser01 /LDAPv3/ldap.example.com ldapuser01_OUT_OF_OD
u 2 ldapuser02 /LDAPv3/ldap.example.com ldapuser02_OUT_OF_OD
u 2 ldapuser03 /LDAPv3/ldap.example.com ldapuser03_OUT_OF_OD
u 2 user01 /LDAPv3/192.168.128.34 49EF9C64-D98B-11D8-BCFA-...
g 1 admin /Local/Default ABCDEFAB-CDEF-ABCD-EFAB-...
u 2 root /Local/Default FFFFEEEE-DDDD-CCCC-BBBB-...
u 2 admin01 /Local/Default 9A7917D1-D8E7-49D6-8211-...
u 2 admin02 /Local/Default 40D516A2-4D02-4C92-9505-...
u 2 user01 /LDAPv3/192.168.128.34 49EF9C64-D98B-11D8-BCFA-...
================================

Current version:
As of now, current version of getsacls.sh is 407 ($Id: getsacls.sh 407 2009-07-09 09:36:26Z patpro $). Next revisions will be listed here.

Update: $Id: getsacls.sh 409 2009-07-09 14:30:01Z patpro $
I've added some error handling for a rare case: when a user account lives on a LDAP server distinct from the Open Directory server, the GroupMembership field is not updated on the OD if the user account is destroyed on the LDAP. So according to the GroupMembership the user is still here, but according to the LDAP the user is nowhere to be found.

Update: $Id: getsacls.sh 412 2009-07-23 20:24:54Z patpro $
I'm forcing LC_NUMERIC in the beachball function, so that sleep 0.05 runs as expected even for people not using the dot as a decimal separator. Some cleanup.

Update: $Id: getsacls.sh 414 2009-08-03 10:33:30Z patpro $
Some cleanup and english corrections. Added some delay to the beatchball rotation so it's more enjoyable.

Feel free to comment, and to correct my english ;)

Related posts