Shell Foo: Merging Common Files In a Directory

Say you have two directories with a bunch of text files. How can you create a third directory that will contain all the files that are common to both input directories, such that every output file is a concatenation of both input files?

Shell-Foo is a series of fun ways to take advantage of the powers of the shell. In the series, I highlight shell one-liners that I found useful or interesting. Most of the entries should work on bash on Linux, OS X and other UNIX-variants. Some probably work with other shells as well. Your mileage may vary.

Feel free to suggest your own Shell-Foo one-liners!

Continue Reading…

How To Use Protocol Buffers In a SCons Project, Take 1

This is the eleventh post in my SCons series. This post starts exploring ways to work with protocol buffer files in a SCons project.

Protocol buffers are a structured-data-serialization mechanism from Google. This is not a tutorial on protocol buffers. I will use the address book project example that appears in the official protocol buffers tutorial.

When using protocol buffers, you write .proto files to describe your structured data. You use the protoc compiler to generate C++ and Python code that allows you to serialize, load, and manipulate your protocol buffers data.

Out of the box, SCons does not know how to compile .proto files into C++ and Python code. The purpose of this post is to start exploring ways to integrate protocol buffers in the build process. The first iteration is based on manual “integration”.

The final result is available on my GitHub scons-series repository.

Continue Reading…

Shell Foo: Parallelizing Multiple wget Downloads

Got a bunch of files to download? Got an open terminal session? Want to use wget to parallelize the download?

How about this:

echo http://dl.whatever.com/dl/file{1..1000} | xargs -n 1 -P 16 wget -q

Shell-Foo is a series of fun ways to take advantage of the powers of the shell. In the series, I highlight shell one-liners that I found useful or interesting. Most of the entries should work on bash on Linux, OS X and other UNIX-variants. Some probably work with other shells as well. Your mileage may vary.

Feel free to suggest your own Shell-Foo one-liners!

Continue Reading…

Running a Time-Limited Subprocess In Python (concurrency caveats inside!)

By Tuesday, January 13, 2015 3 Permalink 1

I tried to write a “simple” Python function. All it had to do is to run a command line in a subprocess, and enforce a timeout on that subprocess.

Turns out that this “simple” goal is not so simple. I considered multiple approaches to solve this, and ran into interesting concurrency issues.

In this post I describe a solution based on Python subprocess and various threading constructs. I demonstrate a concurrency inconsistency I discovered when testing my solution on Linux and OS X.

The conclusion, if that’s all what you’re looking for, is that timer.is_alive() is not a safe way to test whether a timer had expired!

Note: My experience is based on Python 2.7 with the default subprocess module from the standard library. If you’re on Python 3.3+, a timeout argument was added to subprocess. You can also install python-subprocess32, which brings this joy to Python 2.4-2.7.

Continue Reading…

Supporting the SCons Help Command and Quiet Flag

This is the tenth post in my SCons series. The topic of this post is adding support for the silent / quiet flag, and the help command.

In previous episodes I’ve added various enhancements. Some of them print progress information (like processing modules, and two-pass processing). This makes my enhancements quite chatty, which isn’t a problem by itself. The problem is that these messages are printed also in SCons “silent mode” (scons -s), and this episode fixes that.

On the way, I also add a custom help message. Just because it’s fun.

The final result is available on my GitHub scons-series repository.

Continue Reading…

Quick and Dirty Personal Social Analytics With Google App Engine

I want to track basic metrics of my main social thingies – The Ostrich Facebook page, The Ostrich Twitter account, and The Ostrich Google+ page.

My short-term goal is to have simple graphs of some basic metrics over time:

  • Facebook page likes and shares, posts, post likes and shares.
  • Twitter followers (and following), tweets, favorites, retweets.
  • Google+ followers, views, +1’s, shares.

I’m not sure why I want this data, but if I do find a good reason in the future, I’d better start tracking it now!

In this post, I describe how I went from the idea to start collecting the data, to a deployed data collecting app using Google App Engine, in under 5 hours.

Continue Reading…

Automating Module Discovery in a SCons Project

By Thursday, December 25, 2014 0 , , Permalink 0

This is the ninth post in my SCons series. The topic of this post is automating module discovery.

Up until now, the module directories were hardcoded in site_scons/site_config.py. This was a natural choice, because you needed to specify the modules in the correct order. In the previous episode, I implemented a solution that supports arbitrary modules order. Now it’s much more natural to automate the process of module discovery, over manual specification.

As a developer in the project, you know that when you create a new module you need to create a SConscript to describe how to build it. It makes sense that the build system will be able to locate all modules to build, by looking for the SConscript files recursively from the project base directory.

In this episode, I describe how I implemented module auto-discovery by walking the project directory tree.

My implementation also correctly handles common caveats:

  • Avoid walking the build directory.
  • Skip “hidden” directories (like .git, and .whatever from your favorite IDE metadata).

In addition, my implementation provides several useful features:

  • Ability to limit recursion depth.
  • Support for “stop marker files”. If a stop marker file (e.g. .noscons) exists in a project directory, the directory (and all sub-directories) will be skipped.

The episode builds on top of the previous episode. The final result is available on my GitHub scons-series repository.

Continue Reading…

Supporting Arbitrary Modules Order In SCons

By Thursday, December 18, 2014 2 , , Permalink 0

This is the eighth post in my SCons series. In this post, I describe how to support arbitrary modules order.

In an earlier episode, I presented the multi-module C++ SCons project. In that episode, I explained that modules need to be specified at the order of dependence.

This restriction can be annoying, and painfully limiting, once your project “gets serious”.

I promised a better solution, and now I provide one ๐Ÿ™‚ . In a nutshell, the solution is based on a two-pass approach. In the first pass, all library-targets are processed and collected across all modules. In the second pass, all program-targets are processed, using the libraries already collected.

The rest of the post goes into further detail about the solution. The result, as usual, is available on my GitHub scons-series repository. It builds on top of the SCons shortcuts enhancement, in case you need to refresh your memory ๐Ÿ™‚ .

Continue Reading…

Manipulating Python os.walk Recursion

The os.walk function in Python is a powerful function. It generates the file names and sub-directory names in a directory tree by walking the tree. For each directory in the tree, it yields a 3-tuple (dirpath, dirnames, filenames).

It is not well-known that you can modify dirnames in the body of the os.walk() loop to manipulate the recursion!

I’ve seen programmers avoid using os.walk(), and hack their own version of it using recursive calls to os.listdir(), with various path manipulations in the process. It was rare that the programmer doing this was not familiar with os.walk(). More often than not, the reason was that the programmer wanted more control over the recursion. Unfortunately, if the programmer was aware that this can be done with os.walk(), she would probably use it and save time and sweat!

This specific feature is well documented in the Python os.walk docs. Seeing how under-used it is, I wanted to highlight it here, hoping it will serve someone out there ๐Ÿ™‚ .

Continue Reading…

Right-click Hashes and Pythons ASCII command-line

By Thursday, December 11, 2014 0 , Permalink 1

This post is a guest post by Gil Dollberg

A while ago I wrote a Python script that calculates MD5 and SHA1 hashes on a file with a right click. Hereโ€™s the script that calculates the MD5 and the script that writes the .reg file. What you probably want to download is just the reg file – double click, install, and youโ€™re set. Note the pythonw.exe caveat below though…

Continue Reading…