Monday, September 8, 2014

An annoying thing with Python slices

You of course know that Python slices are awesome:

a[:3] -> 'ABC'
a[2:5] -> 'CDE'

And more interestingly:

a[-3:] -> 'EFG'


a[6:4:-1] -> 'GF'

But you can see that the reverse slicing is starting to stretch the fence-post we are familiar with. Python uses zero based, inclusive-exclusive indexing. This corresponds to a C syntax of (for i = n; i < m; i++). When you reverse it the slice goes (for i = m - 1; i > n - 1; i--).

As you can imagine this starts to get ugly and at one point it gets to be wrong:

Say, as is often the case, you are not taking static, pre-determined slices but rather slices determined at runtime. Say you are taking slices between n and m or [n, m).

The forward slice is a[n:m]
The backward slice is a[m-1:n-1:-1] right? Because of the fence posts?

Well yes, except what happens when n = 0? The forward slice is fine but the reverse slice resolves to a[m-1:-1:-1]

This is where Python becomes a little too clever. As you will recall from our earlier examples, negative indices indicate offsets from the end of the object. So, the last slice returns empty.

The correct slice is a[m-1:None:-1] or a[m-1::-1] and the logic for this is cumbersome:

a[m-1:n-1 if n > 0 else None:-1]

The simpler way is to do a[n:m][::-1].

Saturday, September 6, 2014

Mac OS + 'cat' + 'sed' + \n = half-assed

You guys all know how Mac OS darwin does everything JUST a little differently from the *nixes. It's close enough to draw you in, and different enough to stab you in the back. Today's case sed and \n

I needed to cat some files together but I needed a newline between them. I asked my colleague Wan-Ping for a command that would do this and she suggested

cat 1.fa 2.fa 3.fa | sed 's/^>/\n>/g'

So I did this and the sucker added the character 'n' wherever I expected a newline.

It turns out that Macs are special little snow flakes and need a special little command:

cat chr*.fa | sed 's/^>/\'$'\n>/g' > hg38.fa

The magic sauce is the '$' that escapes the '\n'


Wednesday, September 3, 2014

Fixing the big mess with git and case insensitive filesystems

Mac OS X by default is a case insensitive file system. But Mac OS, as in a lot of other things, makes a half-assed job of this. In addition to causing various bits of confusion when creating directories it also leads to a potentially messy situation with git. This is how things happen:

1. You create a directory in your source tree called, say, Plugins with a capital "P".
2. After a few commits you decide that it's better to change this to a lower case "p": plugins
3. When you go to commit this rename (perhaps with a few other changes you implemented) git throws a hissy fit.

After a bit of searching on stack overflow it turns out that this is all related to Mac OS's case-insensitivity.

The cleanest fix I found on stack overflow was:

git mv Plugins temp00
git mv temp00 plugins
git commit

Apparently this fools git's index into doing the change, where as git mv Plugins plugins - because the underlying file system does not recognize the difference - tells git nothing has changed but it has and leaves it in some sort of half way state that messes it up.