Cloning a Git Repository Piece by Piece

Posted: 2021-04-24

A while back, I ran into a problem downloading a vendor's Board Support Package (BSP) from their public GitLab instance. This git repository included the source code for Android and Linux, which is what I was interested in. It also included the toolchain package used by the vendor to compile that source code, as well as some prebuilt firmware binaries. In all, there were about 11 GiB of files.

The problem was that every time I tried to clone this repository, my HTTP connection was reset after exactly 1 GiB had been transferred:

$ git clone https://git.example.com/project/big-repo.git
Cloning into 'big-repo'...
remote: Enumerating objects: 600022, done.
remote: Counting objects: 100% (600022/600022), done.
remote: Compressing objects: 100% (390985/390985), done.
error: RPC failed; curl 18 trasfer closed with outstanding read data remaining
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

No matter what machine I tried, connecting over IPv4 or IPv6, I could not overcome this limit. It was not a timeout, as the disconnection happened at the same number of bytes downloaded regardless of the connection speed. There must have been a misconfigured firewall between me and their GitLab server. In any case, in order to clone this repository, I would have to divide it into pieces.

Recovering from failure

The first problem to solve is that git clone deletes the local copy of the repository when it fails. This prevents examining the partial clone to see what went wrong. This problem is relatively simple; we can split git clone into a series of separate commands, one for each step in the clone process. Then, if any step fails, we can immediately retry that step instead of starting over at the beginning. For example, the command:

git clone https://git.example.com/project/big-repo.git

becomes:

mkdir big-repo
pushd big-repo
git init
git remote add origin https://git.example.com/project/big-repo.git
git fetch origin
git checkout --track origin/HEAD
popd # Optional, only here for equivalence

Now, unsurprisingly, the git fetch command fails, in exactly the same way as git clone.

Going back to the beginning

git fetch is already designed to incrementally fetch new commits, so the simplest way to partition the repository is by its commit history. By default, git fetch downloads all branches and tags, but we can limit the download to a single branch or tag by specifying it on the command line.

For example, to download only the history up through the first release tag, assuming such a tag exists, we could run:

git fetch v0.1

GitLab provides a list of available tags in its web interface. If there are no tags, the same concept would work with an older release or feature branch. Unfortunately, the repository in question had no tags, and contained only a single branch with about 400 commits.

Is it possible to fetch only part of a branch? The answer is "maybe". Git has a few server-side options controlling what clients are allowed to fetch, in order of increasing permissiveness:

uploadpack.allowTipSHA1InWant: Git allows servers to hide branches. This option allows clients to fetch hidden branches, if they know exactly what commit each branch points to.
uploadpack.allowReachableSHA1InWant: This option allows fetching just part of the history (that's what we want!), but not anything that has been made unreachable by a force push.
uploadpack.allowAnySHA1InWant: This option allows fetching "any object at all", even objects that are not commits.

By default all of these options are set to false. Thankfully, GitLab has the most permissive option uploadpack.allowAnySHA1InWant enabled, so we can find the hash of the oldest commit in the web interface, and fetch only it:

git fetch origin f1e2fe1208cf # Fetch the first commit
git fetch origin --negotiation-tip f1e2fe1208cf # Fetch the remaining commits

Using the --negotiation-tip option here tells the server that we already have the history up through commit f1e2fe1208cf, so the server does not need to send that part.

It is also possible to incrementally download the history of a branch starting from the new end using the --depth and --deepen options to git fetch:

git fetch origin --depth=10
git fetch origin --deepen=10
git fetch origin --deepen=10 # Repeat until nothing new is fetched

In this case, neither option was sufficient, because the root commit was over the 1 GiB limit by itself. We need some way to download just part of that commit.

Fetching trees

Since GitLab has uploadpack.allowAnySHA1InWant enabled, we can fetch "tree" objects in addition to commits. Tree objects are what git uses to represent the directory structure. They contain a list of entries, each with a name, mode, a reference to some object. Entries for files reference "blob" objects, and entries for subdirectories reference other tree objects. For example, here is a pretty-printed version of the tree object for the Android source directory:

$ git cat-file -p f2a048d9eff3c0263dfb3acbeb2987f3a9f0f6eb
120000 blob 543755010469814f7f2cb4551169f2dc7828dcf8    Android.bp
100644 blob 6c8f7952e9cc029a8cf834762ae4fcbf9d4ab746    Makefile
040000 tree 19e61df5f2f7cbfa6892cd8f43b4c2c9bba15636    abi
040000 tree 9e9fa1147dfca7b69d8842472078cff3a4a5264e    art
040000 tree b3140f26c00d2d9242bea0615da837bf2079fb07    bionic
040000 tree c5158a134ce4179cdc6eba482b4ced14910f0680    bootable
120000 blob 1dfe7549a7983ca0968c74188da99c5c941e42d6    bootstrap.bash
100755 blob c763b9b060b9acd1797d6785ea90a578f29d0bb0    build.sh
040000 tree 140f15a5a3fbaedc09710f61155de4f0fd137835    build
040000 tree 4652dfc37674ee79d09e32118640d2d8c1ad65e7    cts
040000 tree e6907482727e9555241c09ec5781f8e1c47c479e    dalvik
040000 tree 34938f981ab6197f5a3a10290df4b191508120d3    developers
040000 tree 98b0d55fd1070cb78dba01b248c0531a2d70eaaa    development
040000 tree b2130017ab6dcf6a5e5728d62ce87bd0abdc9547    device
040000 tree bb449224fba6960c4b370a60a7aaf35e75807c73    docs
040000 tree 50304feae93616932253329be1f8ccdd80d80e63    external
040000 tree 6b4c7e1e07f1bf2ad1c6fbbb0c516c697d80398c    frameworks
040000 tree e271f84b9d51ce6a39aebf5fc7bea79d4d14b681    hardware
040000 tree 093b48e71014c5382d501669a1163202f727a8a7    kernel
040000 tree 2492878d074d91d73fb4aa349f596e82b358af28    libcore
040000 tree ad4e5f443784eb23aa1744c22f84354f8c6c442c    libnativehelper
040000 tree d61c51939fe8155a2526990f9410cd7260f442d0    ndk
040000 tree b6c09b927416b63d375eed3e54c465b7f6074712    packages
040000 tree e95b4103d65cfcec2894318df77a77a1734d1353    pdk
040000 tree 3964cc4d96505825adeba51481efb0a8f7dce7e6    platform_testing
040000 tree 6e41e803329a9336e3d0f40f3fc21bd7dde4312f    prebuilts
040000 tree de549ff8c45b0d5b9612ddae6b1e9aa2cbed3cb1    sdk
040000 tree 41e6db157d7febf76a51c76ebea12c6593e0bb62    shortcut-fe
040000 tree eee0f75c0dbd76b76ef4f10e08769f83aa7c0742    system
040000 tree b8c5f1dfd434fb571036e1b6cb30b352b55d2d76    toolchain
040000 tree acdfa1bcb1923b081ab195daa23871ad358d4a1a    tools
040000 tree f74851a49456551f142611ea452be092e7bf621a    vendor

So we could pass each of those hashes to git fetch, and each command would download just that one directory. Of course, since git fetch ensures there are no dangling object references, all of the contents of that directory would be included as well.

And that is a problem: once we have downloaded all of the subdirectories using that method, if we were to try to download the tree object shown above:

git fetch origin f2a048d9eff3c0263dfb3acbeb2987f3a9f0f6eb

git would fetch not only that one tree object, but also all of the subdirectory contents that we just fetched! And while git fetch has a --negotiation-tip option, that option only works on commits, not trees.

We have a second problem as well: we do not know the hashes of any tree objects in the repository! We know the commit hash from the web interface, but we cannot use that to guess even the root directory's tree hash.

Using the GitLab API

The solution to both problems is the GitLab API. It provides three API endpoints we need:

First, we need the repository's internal project ID, so we can use the other API endpoints. The easy way to get this is through the search functionality in the Projects API:

$ curl -s https://git.example.com/api/v4/projects?search=big-repo | jq
[
  {
    "id": 136,
    "name": "big-repo",
    "name_with_namespace": "project / big-repo",
    ...
  }
]

The Repositories API contains a tree endpoint which provides the all-important tree hashes. It also provides names, types, and modes, which are incidentally what we need to create our own tree objects.

$ curl -s https://git.example.com/api/v4/projects/136/repository/tree | jq
[
  {
    "id": "f2a048d9eff3c0263dfb3acbeb2987f3a9f0f6eb",
    "name": "android",
    "type": "tree",
    "path": "android",
    "mode": "040000"
  },
  ...
]

The Repositories API also has a raw blob endpoint, which we can use to download the contents of individual files.

Using only these API endpoints, we can reimplement git fetch for tree objects. Here's how it works: we start by requesting the contents of the root directory tree. We convert the response to git's internal tree format, and write that out to disk. Then for each entry in the tree:

If it is a blob, we download the raw contents of the file, and convert that to git's internal blob format.
If it is a tree, we perform this process again, recursively.

While git's internal blob and tree formats are simple enough, there are some gotcha's to be aware of:

Trees are sorted, but in an unusual order. Notice how build comes after build.sh in the tree printed above? That is because trees have a forward slash (/) implicitly appended to their name when being sorted. And that slash, at \x2f, comes after the full stop (\x2e).
Objects must be compressed in the zlib format before being written to disk, but the object ID is the SHA-1 hash of the uncompressed contents.

This is one place where git's use of hashes really shines, because it makes the object IDs 100% deterministic. Once we have downloaded a file from GitLab and constructed its blob, we can verify its integrity by checking the blob's hash against the entry in the tree.

The result is the gitlab_fetch package, available on GitHub!

Our git fetch implementation is incredibly inefficient, because it makes a separate HTTP request for each object, but splitting the work across many HTTP requests is exactly what we need.

A better way

Fetching one file at a time using the GitLab API is so slow that I developed another method and used it to fetch the whole repository while gitlab_fetch was still running.

It turns out that --negotiation-tip not working with trees is only a client- side limitation in git fetch. The actual git smart protocol allows providing trees for both have and want lines in a git-upload-pack request. So we can implement a similar algorithm combining the GitLab API with manually-constructed git-upload-pack requests:

Get the tree contents via the GitLab API.
For each entry in the tree:
1. Attempt to fetch that entry using git fetch.
2. If it fails due to hitting the 1 GiB limit, recurse into that directory.
Fetch the tree object manually (e.g. using curl or Python), providing the hashes of all entries as haves. This will return a pack object, which can be passed to git unpack-objects.

This method also allows fetching the actual commit object, since we can tell the server that we have the top-level tree object. Constructing the commit object from the GitLab API alone would have required finding all of the bits of metadata (author, committer, date, message) and putting them together in exactly the correct format.

There's no code available for this method. I must admit I used vim to construct the requests, curl to send them, and dd to extract the packfiles from the responses. But it worked!

Conclusion

If you ever run in to this incredibly specific scenario, with a bit of effort, it is indeed possible to clone the repository you want.

HOWTO Pass Onboard USB Ports to a KVM Virtual Machine

Posted: 2016-02-29

Linux
KVM

I have been using KVM virtualization with PCI passthrough on my main workstation for a while now, in order to run Windows with native performance while isolating it from (the rest of) my hardware. That way, I can use ZFS and dm-crypt to ensure data integrity and security, and I am still able to run some Windows-only applications, like games or Visual Studio, as needed. My setup has gone through various iterations of hardware and software since I first tried it out with (I believe) Xen 4.2 git builds in the summer of 2012. Needless to say, the relevant software and the ecosystem around it have grown significantly more mature since then. Today I use an nVidia card with QEMU, OVMF, and VFIO, and I am down to only one patch!

In this article I describe how to give a virtual machine direct access to some or all of your onboard USB ports. Furthermore, I show how to dynamically choose which ports the VM has access to. This guide should work on any recent Intel chipset, but be aware that it has only been tested (as far as I know) on the Z97 and X99. Also note that it may not work with every motherboard (see step 1). I have been using this USB configuration on my machine since December 2015, and I have experienced no crashes or instability of any kind. Another example of success is here.

Step 0: Why would I want to do this?

There are several reasons you might want to set this up. I have four. My initial reason for doing this was to fix audio stuttering I experienced when using QEMU's USB passthrough a Bluetooth adapter. Second, USB passthrough does not work at all for some buggy devices. Third, since I run QEMU directly from a script (not through libvirt), I would have to use the QEMU monitor interface to attach a flash drive to the VM. Fourth, for latency and reliability, the best way to assign a keyboard/mouse/webcam to a virtual machine is to pass through an entire USB controller. Since I have a mini-ITX motherboard, my only PCIe slot is taken up by my graphics card—I have no space for a PCIe USB controller.

Previously, I have just used a second keyboard for Linux, or managed the host through PuTTY from Windows. However, my current desk has no space for two keyboards, and I have grown increasingly distrustful of Windows, to the point where I no longer want to store any SSH keys in the virtual machine. I thought about my options, and wondered, "My chipset has 3 USB controllers; what if I could pass through one of those?" I tried every permutation of the USB options in my UEFI firmware to see if I could get some of my devices to show up on different controllers. I had no luck. Either everything was on the xHCI controller (USB 3.0 enabled), or everything was on the EHCI controllers (USB 3.0 disabled). Completely disabling USB 3 was not an option—that would cut transfer speeds to my external SSD by a factor of 10.

Unwilling to give up, I searched through several pages of Google results until I found this Google+ post (unfortunately, the author's account appears to have since been deleted) with a link to the 9 series PCH datasheet ¹ and a suggestion of using setpci². That post led me to the solution I describe here.

Step 1: Will this work with my motherboard?

First, let me make sure the terminology is clear. "SuperSpeed" is the 5Gb/s protocol that requires at least USB 3.0 and uses the blue ports. "HiSpeed" is the 480Mb/s protocol from USB 2.0 that can use either color of port. An EHCI controller handles USB 2.0 (and possibly 1.1). An xHCI controller handles everything up to and including USB 3.0. This fact means that a HiSpeed port can be connected to either controller. If one of these two controllers is given to the VM, and the other is not, then USB ports can be moved between the host and VM at runtime!

As I noted above, this HOWTO should work with any recent Intel chipset. Recent means any Intel chipset with a built-in xHCI controller. Built-in means its address in lspci will start with "00:". I do not own an AMD motherboard, so I have no idea how those work. If you have an AMD chipset, you must do your own research. Personally, I have only tried this with an ASRock Z97E-ITX/ac. Another user from the vfio-users mailing list had success with his Gigabyte GA-X99-UD4.

Assuming you have an appropriate chipset, you will want to look through your UEFI firmware's USB settings and figure out what they do. Usually "handoff" or "legacy OS" options will not make much of a difference, but the USB 3.0 enable/disable setting will affect which controller ports show up on.

You want to find the setting in your firmware where both at least one EHCI controller (hopefully two of them) and an xHCI controller show up when you run lspci | grep USB. This may be something like "Enabled", "Auto", or "Manual".³ Note that which controllers show up may depend on what devices you have plugged in when you turn your machine on.³ If none of the settings cause both types of controllers to be present, your only choices are to a) turn off USB 3 entirely and pass through one of your two EHCI controllers or b) complain to your motherboard manufacturer.

Step 2: Make a map of your USB ports

Now turn USB 3.0 to "Off" or "Disabled", if possible; this will greatly reduce confusion. In Linux, run lspci | grep USB and ensure only EHCI controllers are present. Then note down the address and number of each controller. Now run readlink -f /sys/bus/usb/devices/usb*. The output should look something like this:

$ readlink -f /sys/bus/usb/devices/usb*
/sys/devices/pci0000:00/0000:00:1a.0/usb1
/sys/devices/pci0000:00/0000:00:1d.0/usb2

What you are doing here is associating a USB bus number (the last number on each line from readlink) with a USB controller number (from lspci), by looking at the controller's PCI address. Note that these bus numbers can change whenever your reboot your machine, so if you need to change firmware settings before mapping your ports, do this part again after you reboot.

Now run watch lsusb -t, unplug every USB device from your system, and then plug one single device (e.g. a flash drive) into every port you can find, one at a time. There should be 14 total ports. If you cannot find some, they might be in a mini-PCIe or M.2 slot (onboard Bluetooth?), or just not connected (front panel headers?).

Use the bus numbers you found earlier, and the output of lsusb, to map physical ports to logical ports on controllers. Eight of them will be on the EHCI controller marked #1, and the other six will be on #2. Now sort them, starting with EHCI #1 port 1, and ending with EHCI #2 port 6.

Now, in your firmware setup, change USB 3.0 back to the setting you found earlier where both EHCI controllers and the xHCI controller appear.

Step 3: Pass through your controller

This is the easy part. Just assign whichever controller(s) you want to use in the VM to vfio-pci, and add it to your libvirt XML or QEMU command line. Personally, I only ever need to use USB2 devices in Windows, so I pass through both EHCI controllers. You can do it the other way around if you like: assign the xHCI controller to the VM, and keep both EHCI controllers for use on the host.⁷

Step 4: Route your USB ports

On your sorted list of ports, mark which ones you want to have available in the VM, and which ones you want to use on the host. This is switchable at runtime, so you can create several lists—I have some ports that are routed to EHCI all of the time, and others I toggle back and forth.

Turn each list of port routes into a binary mask: write a 0 for each EHCI port, and 1 for each xHCI port. If you pass through your xHCI controller, every port with a 1 in the mask will be assigned to the guest (inaccessible on the host). If you pass through both EHCI controllers, every port with a 0 in the mask will be assigned to the guest. If all of the ports you want to use in the VM end up on just one of the EHCI controllers, you only have to pass through that one controller.

If you end up with any SuperSpeed (blue) ports as zeroes in your mask, be careful! Either make sure you never plug a SuperSpeed device into that port, or turn the SuperSpeed part off in the xHCI controller. Otherwise confusing things may happen.⁴

The bits in the mask will be in the same order as the list you made of your ports. The least significant bit (the one on the right) is EHCI #1 port 1. Take your 14-bit mask and extend it to 16 bits by setting the two most significant bits (on the far left) to 0. Then convert it to hexadecimal (you can do this online). Your final result should be something between 0x0000 and 0x3fff.

For example, if you want to pass through all of the ports on EHCI controller #2, you want a mask of 0x00ff.

    00  00 0000  1111 1111
    \/  \_____/  \_______/
Always  EHCI #2   EHCI #1
Zero    65 4321  8765 4321

Finally, run setpci (as root) to change the routing. Moved USB devices will reset and show up in the guest immediately.⁵ The format of the command is setpci -s<xHCI PCI address> 0xd0.W=0x<mask in hex>. No matter which controller(s) you pass through, you still must run the setpci command on the xHCI controller (which is normally 0:14.0).

For example, to move everything to the EHCI controllers:

# setpci -s0:14.0 0xd0.W=0x0000

And to move everything back to the xHCI controller:

# setpci -s0:14.0 0xd0.W=0x3fff

And, even though you are running this command to configure the xHCI controller, the bits are in the order the ports appear to the EHCI controllers. If, for some reason, you cannot turn off xHCI to find that order, use table 2-2 in section 2.2 of the PCH manual to map xHCI ports (subtract 1 from what you see in lsusb) to EHCI ports.

Step 5: Toggle your port mask

If you are going to permanently assign these ports to the virtual machine, you can add the setpci command to /etc/rc.local or somewhere else it will get run on boot. Otherwise, read on to see how to move ports dynamically between controllers. Personally, I use this method to avoid using synergy or a KVM switch, so I have to have a way to switch my keyboard back and forth while running the VM.

The first part is to have a service running as root, that will run setpci for you when you tell it to switch ports. I use this python script; I am rather new to Python, so it is not pretty, but it works. You should only need to change the masks and TCP port number (at the bottom). Make the service also somehow run on boot. Since this is a sensitive service, it should only listen on localhost.⁶

On Linux, you can now use netcat as any user to toggle your ports. I use i3, so I have a key mapped like so, which toggles my USB ports and then locks my screen:

bindsym Control+Mod1+Menu exec --no-startup-id busybox nc 127.0.0.1 10 <<< toggle; exec --no-startup-id slock

The second part is to connect qemu to your service. The simple and secure way to do this is to connect a serial port inside the VM to your service. I use Q35 and qemu directly; you will have to adapt this to libvirt yourself:

-device isa-serial,chardev=com1,index=0 -chardev socket,host=127.0.0.1,id=com1,port=10

Third, you need some way to connect to the serial port from within the guest. On a Linux guest, you can use picocom, socat, etc. On Windows, the simple way is to use a PowerShell script:

$port = New-Object System.IO.Ports.SerialPort COM1,115200,None,8,One
$port.Open()
$port.WriteLine("toggle")
$port.Close()

Personally, I use an AutoHotKey script to map that same key combination in Windows to switching my ports and then locking my screen.

Step 6: Profit

Hopefully that should get you going. It definitely took a weekend to get set up the first time, but once it works, it really is rather seamless.

Notes

If you have a different chipset, the URLs are similar; for example, the X99 datasheet is here. Note that the chapters are off by one: the xHCI controller is chapter 17 instead of 16. Also see also sections 5.19-5.21 of the datasheet for more information about the USB controllers.
The author of the linked Google+ post had found a suggestion to use setpci, but was asking for help because he was unable to get it working. Hopefully he will find this HOWTO, but I have no way to contact him or credit him for the idea.
Jonathan Scruggs reports here his five options for xHCI in UEFI:
- Enabled: All EHCI controllers are disabled and do not even show under a lspci
- Disabled: Only ECHI controllers are visible
- Manual: All controllers are active, but all USB devices will appear under XHCI using lsusb. This is the mode that allows an OS to manually manipulate what devices appear under the EHCI controllers
- Auto: Will dynamically switch between the above three modes. Thus, to keep it from switching to full Enabled mode, you will want to set it to Manual
- Smart Auto: Induces a latency as it tries to determine heuristically what mode to go in. Still want to set to Manual mode My options are similar, except that I have no "Manual" mode, so I use "Auto".
See section 16.1.38 of the PCH manual for the register to change. I have not needed to do this (because I only pass through HiSpeed ports), so I am not certain, but I believe the port numbers here are as they appear to the xHCI controller. See table 2-2 in the PCH manual.
If this appears not to work, check your syntax, then check dmesg. If you see nothing in dmesg, beware that this process might not work on your board if your firmware disables switching ports (see section 16.1.37 in the PCH manual).
You could have the service accept remote connections, but then anyone on your network could control your USB ports. I only condone this if you have a separate isolated network to use with qemu, and then only if you have the service listen only on that network.
I believe the EHCI controllers on the PCH do not support MSI, so that may affect performance (increased CPU usage) in a virtual machine with USB devices like flash drives. With a keyboard or mouse, it will not make any difference.

SELinux, or Why DriveDroid is Broken on CyanogenMod 13

Posted: 2015-12-27

Android

Ever since it was first released in August, I had been using Copperhead OS on my Nexus 5. It's a custom Android distribution that integrates PAX and libc hardening. The intention of the developers from the start was to upstream as much of their work as possible and rebase on top of AOSP. Until about a week ago, however, it was based on CyanogenMod 12.1. This was great. I got a generally open-source-friendly firmware with the only free-software superuser implementation, with a bunch of extra security features added.

Then Android 6.0 "Marshmallow" made its debut earlier this fall, introducing several minor features, such as breaking push IMAP synchronization. Apparently it also included enough of the Copperhead work that they decided to go ahead and switch over earlier this month. Unfortunately, this made Copperhead no longer fit my use case. I use root for about three things on my phone: AdAway, a terminal, and DriveDroid. For the first two, I can generally get by with ADB in recovery mode. The third requires an actual app. Yes, all it does is run a couple of commands and write to a file in sysfs, but on a touchscreen, with a touchscreen keyboard, that needs a user interface.

The problem with the new AOSP Copperhead is that it has no superuser access. I could install SuperSU, but as I alluded to earlier, it is important for me to have the program that manages access to, well, everything be free software. SuperSU is not. That means going back to CyanogenMod. Now, CYNGN is not my favorite company, but CyanogenMod is generally the most stable and least "script-kiddie" of the rooted custom firmwares.

Thankfully, CyanogenMod 13.0 had just been released when Copperhead broke. This version was based on Marshmallow, so it presumably included all of the userspace hardening that Copperhead got into AOSP. Since I had needed to put PAX into soft mode for DriveDroid to work on Copperhead anyway, I figured I was not losing much security by switching. Why not just stay on the latest CyanogenMod-based Copperhead version? Because, again, security. Not staying up to date is not really an option.

So I upgraded. Everything appeared to work fine, until I went through the DriveDroid USB setup wizard. For some reason, I couldn't read the partition table from the emulated drive:

Dec 20 20:31:19 sodium kernel: scsi 9:0:0:0: Direct-Access     Linux    File-CD Gadget   0000 PQ: 0 ANSI: 2
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: Attached scsi generic sg4 type 0
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] 8192 512-byte logical blocks: (4.19 MB/4.00 MiB)
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] Write Protect is off
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] Mode Sense: 0f 00 00 00
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 Sense Key : 0x3 [current]
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 ASC=0x11 ASCQ=0x0
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
Dec 20 20:31:19 sodium kernel: blk_update_request: critical medium error, dev sdd, sector 0
Dec 20 20:31:19 sodium kernel: Buffer I/O error on dev sdd, logical block 0, async page read
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 Sense Key : 0x3 [current]
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 ASC=0x11 ASCQ=0x0
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
Dec 20 20:31:19 sodium kernel: blk_update_request: critical medium error, dev sdd, sector 0
Dec 20 20:31:19 sodium kernel: Buffer I/O error on dev sdd, logical block 0, async page read
Dec 20 20:31:19 sodium kernel:  sdd: unable to read partition table
Dec 20 20:31:19 sodium kernel: sd 9:0:0:0: [sdd] Attached SCSI removable disk

I attributed it to something about the image file in use, and carried on. But when I copied over a UEFI update ISO for my laptop and it also was unreadable, I realized the problem was the interaction of DriveDroid and something in CyanogenMod 13. To the dmesg!

[  247.848373] type=1400 audit(1450744742.272:6): avc: denied { use } for pid=91 comm="file-storage" path="/storage/emulated/0/Download/ISO/gjuj23us.iso" dev="fuse" ino=167 scontext=u:r:kernel:s0 tcontext=u:r:sudaemon:s0 tclass=fd permissive=0
[  247.870718] type=1400 audit(1450744742.282:7): avc: denied { use } for pid=91 comm="file-storage" path="/storage/emulated/0/Download/ISO/gjuj23us.iso" dev="fuse" ino=167 scontext=u:r:kernel:s0 tcontext=u:r:sudaemon:s0 tclass=fd permissive=0
[  247.871447] type=1400 audit(1450744742.292:8): avc: denied { use } for pid=91 comm="file-storage" path="/storage/emulated/0/Download/ISO/gjuj23us.iso" dev="fuse" ino=167 scontext=u:r:kernel:s0 tcontext=u:r:sudaemon:s0 tclass=fd permissive=0
[  247.881682] type=1400 audit(1450744742.302:9): avc: denied { use } for pid=91 comm="file-storage" path="/storage/emulated/0/Download/ISO/gjuj23us.iso" dev="fuse" ino=167 scontext=u:r:kernel:s0 tcontext=u:r:sudaemon:s0 tclass=fd permissive=0

That'll do it. So SELinux is blocking the kernel from accessing the file. I opened a root shell in ADB, ran setenforce 0, and sure enough: I could boot off the ISO.

Now that I know the general problem, how can I fix it? I could leave my device in permissive mode, but that would be poor for overall security. What I need to do is create a policy that allows this specific action, and graft it into the existing policy. The common solution is to use a tool that comes with SuperSU, supolicy, but since I am using CyanogenMod, I don't have it. And since it is closed-source, I am not interested in installing it. (Plus, it adds a bunch of other policies when run, and is not compatible with the policies used to implement CyanogenMod's superuser support.)

After some searching, I found the free "alternative supolicy", sepolicy-inject. (See also here.) Unfortunately, it does not work with Android 6.0. It dies trying to read the /sepolicy I pulled from my phone:

libsepol.policydb_read: policydb version 30 does not match my version range 15-29

Apparently, Android uses a newer SELinux policy version than available in any released version of libsepol. So you have to get this version with prebuilt libraries. It works, on the desktop, but using the prebuilts makes it hard to cross-compile for native use. In any case, I added the appropriate policy rules:

$ ./sepolicy-inject -s kernel -t sudaemon -c fd -p use -P ./sepolicy -o ./fixed

Then I pushed the file on to my phone and and loaded the policy from a shell.

# load_policy ./fixed

Hooray! It works! Now I just have to get that policy loaded on boot, so I don't have to run the command every time I reboot my phone. Android has a method for loading SELinux policy from the /data partition, meant for OEMs to push policy OTA updates. To load the policy via an intent, it has to be signed, but you can just push the files in the right place with ADB and they will be loaded. So I put my fixed sepolicy file in /data/security/current, rebooted, and... it wasn't loaded. It turns out you have to put a selinux_version file in that directory, to avoid loading policy designed for an older system image.

# cat /selinux_version > /data/security/current/selinux_version
# restorecon /data/security/current/selinux_version

I copy that file into the right place, reboot, and... everything crashes. The RIL (radio interface layer) can't access the radio, so everything relating to cellular connectivity force closes. It's not obvious from the SEAndroid page I referenced above, but all of the files used for specifying security policy are loaded from /data/security when selinux_version exists.

12-21 23:13:55.842  2381  2381 W SELinuxMMAC: java.io.FileNotFoundException: /data/security/current/mac_permissions.xml: open failed: ENOENT (No such file or directory)
12-21 23:13:55.674  2381  2381 E SELinuxMMAC: java.io.FileNotFoundException: /data/security/current/seapp_contexts: open failed: ENOENT (No such file or directory)

Some of those other files are used to put certain apk packages in a more priveleged context. Therefore, you have to copy the rest from the initramfs to /data/security/current as well (they do not need any modifications).

# umask 022
# for f in /file_contexts /seapp_contexts /property_contexts /service_contexts /selinux_version /mac_permissions.xml
# do
#   cp $f /data/security/current$f
#   restorecon /data/security/current$f
# done
# chown -R system.system /data/security/current

I reboot again, and cellular service works again. That's good. But the replacement policy is not loaded. Surely now we are getting into "this is a bug" territory (from "this is an unintended use case"). Looking in the dmesg again:

[   62.824360] init: SELinux:  Could not load policy:  Permission denied
[   62.824755] type=1400 audit(66564380.419:5): avc: denied { load_policy } for pid=1 comm="init" scontext=u:r:init:s0 tcontext=u:object_r:kernel:s0 tclass=security permissive=0

This ~~legitimately makes no sense~~ made no sense at the time of my investigation. Apparently, CyanogenMod's GitHub was outdated (yes, I was looking at the right branch), because this commit was not merged and pushed. But that explains things now. So Google removed this support in Marshmallow because nobody used it. Well, I used it. It actually worked, albeit intermittently. It would work once after running load_policy on a file in /data, but never after running load_policy /sepolicy. This explains why it did not work when run during boot. (If you're wondering how I can even load policy with that neverallow rule being there, CyanogenMod puts superuser access in permissive mode.)

It seems, then, the only real solution is to compile my own CyanogenMod builds, with that rule added in at the source level. I do a bit of customization of my system image with an addon.d script anyway. I was trying to avoid that, as it increases the lag time for getting security updates and other bugfixes. Security is hard.

The Problem with Encrypted Backups

Posted: 2015-11-08

Cryptography
Storage

It is often said that any data you do not have backed up is data you do not care about. I agree. This adage is used to encourage people to start keeping copies of their data. However, simply making copies is not enough to make data safe. Those copies must be invulnerable to whatever catastrophe you are trying to protect your data from. Thus, when choosing a method of backing up your data, you must take into account your threat model: that is, what could go wrong, and how will your data survive it?

For example, if you are only worried about hard drive failure, RAID1 is "backup" enough. If you just want to recover from accidental file deletion or an application corrupting your files (bugs, malware, etc.), then filesystem snapshots are a reasonable solution. On the other hand, if your family photos need to survive a house fire, you need some sort of offsite backup, like giving a hard disk to a friend, or uploading to a cloud service.

My main backup solution is to synchronize my home directory between my laptop and my home server with unison. This works great for protecting against several failure modes: accidental deletion, filesystem corruption, disk failure, stolen laptop, and natural disasters at my house. I consider this to be effectively an offsite backup because my laptop is rarely in the same building as the server. My workstation mounts its home directory with Samba, so it has nothing to back up. Operating system configuration is version controlled and stored in my home directory (I currently use Ansible). The only thing left is data on my server outside my home directory, like my web site.

I use encryption on all of my systems for various reasons. For my laptop, it is to prevent data access in case it is lost or stolen (physical security is especially hard with small self-contained machines you carry around with you). For my workstation, it is because I am not the only person with a key to my abode, so I cannot guarantee that nobody else has gained physical access to my machine. For my server, it is so I can RMA a failed disk without worrying about its contents. On all of my systems, and additional reason is to prevent someone booting a live system and installing a rootkit or keylogger. Therefore, I must use full-disk encryption and not simply use PGP with individual files.

Since I use encryption on the main copy of my data, to maintain its security I must also encrypt my backups. However, it would be a good idea to encrypt offsite backups even if my main storage was not encrypted. Data at rest is kept confidential in one of two ways: by ensuring the physical security of the computer it is stored on or by encryption. This first method is generally not possible when it comes to offsite backups (unless you, e.g., work for a large company with multiple datacenters), so that leaves you with only encryption.

Unfortunately, encrypted offsite backups present a few problems:

They are long-lived due to the effort required to make them (slow uploads for cloud services, or having to carry a physical disk to a secure location)
They are rarely used, since all but the most catastrophic failures can be recovered from locally
They must be encrypted, since you cannot guarantee physical security
The password must be unusually strong, since any attacker will have a long time to mount a brute-force attack (because, from point 3, they could just image your disk, which they will have in possesion for a while due to point 1, and you won't notice because of point 2) The combination of points 2 and 4 above mean you have to create a strong password that you will rarely use. This sounds like the perfect recipe for a password you will forget.

And here we get to the personal application--the motivation for the story. I had a blog. It had several posts. In a way, I still have the blog. It is stored on a laptop hard disk in my desk. However, that disk is encrypted with dm-crypt and a rather long passphrase. It is an offsite backup made August 2014. I have tried brute forcing anything I thought the password might be, but I have not been successful. You see, at one point, I was shuffling data around on my disks and ran out of space. I had the idea that "well, everything is backed up anyway, so I can just delete it now and restore it later when I need it." Then I forgot for a year, and by the time I needed the data again, I could not access it. So all of my previous website content is lost, probably forever, to the mathematical void. I keep the disk in hope that one day I will remember the passphrase, but I realize my chances are slim.

What did I learn from this? That an untested backup is no backup at all! Test your backups!