Solving BCACTF 1Captcha

I speant most of yesterday trying to solve the harder problems of BCATF_2.0 and... I didn't totally fail. So I'll be making a bunch of posts explaining my solutions. First up is 1Captcha.

Premise

1Captcha is an ironically designed "I am not a robot" captcha - ironic in the sense that only robots can solve it and humans can't. The captcha gives us a pseudorandom image of either the number "1" or the letter "l". We then have to select all the matching tiles. Now, there's two things which made this impossible for a human:

Normal humans just can't recognise near 1-pixel differences in only two seconds. However, maybe a computer can (^_^).

Ideas

I'm probably not the only one who jumped straight to neural networks as a possible solution here. However, I quickly realised why this wouldn't work - the images were unlabled. My second idea was to somehow measure the "similarity" between the target image and each of the tiles. Any tiles which are similar enough should be selected.

Image similarity

To measure the distance between two images, I exploited a simple form of edge detection.

Here's a close-up of one the tile images. For each row, we scan from left to right until we find a pixel that's different from the previous one. Since all tile images have solid backgrounds, this gives us the coordinates of the character's left edge.

We then normalise this by computing the changes between consecutive edge pixels. For example, an edge with coorindates [(7, 2), (5, 3), (5, 4), (6, 5)] would be normalised to [-2, 0, 1].

I assumed that tiles containing the same character should have the same normalised edge. To my surpise, this was true about 90% of the time. Bingo!

Now all we need to do is find tiles who's edge matches the target image.

Solving process

The script succeeded on the second try, completing all 20 captcha stages.

=> archive/fbb0e6e7599a94db.py solver.py