This is the ninth post in my SCons series. The topic of this post is automating module discovery.
Up until now, the module directories were hardcoded in site_scons/site_config.py. This was a natural choice, because you needed to specify the modules in the correct order. In the previous episode, I implemented a solution that supports arbitrary modules order. Now it’s much more natural to automate the process of module discovery, over manual specification.
As a developer in the project, you know that when you create a new module you need to create a SConscript to describe how to build it. It makes sense that the build system will be able to locate all modules to build, by looking for the SConscript files recursively from the project base directory.
In this episode, I describe how I implemented module auto-discovery by walking the project directory tree.
My implementation also correctly handles common caveats:
- Avoid walking the build directory.
- Skip “hidden” directories (like
.git
, and.whatever
from your favorite IDE metadata).
In addition, my implementation provides several useful features:
- Ability to limit recursion depth.
- Support for “stop marker files”. If a stop marker file (e.g.
.noscons
) exists in a project directory, the directory (and all sub-directories) will be skipped.
The episode builds on top of the previous episode. The final result is available on my GitHub scons-series repository.
As a reminder, the (seemingly silly) C++ project is a simple address book program. Refer to a previous post if you need more details.
Basic naive implementation
You may recall that my SCons multi-module framework uses the modules()
function in site_scons/site_config.py to obtain the module directories to process. Here’s a first iteration of a modules()
function that automates module discovery:
def modules(): """Generate modules to build (directories with a SConscript file).""" for dirpath, dirnames, filenames in os.walk('.'): if 'SConscript' in filenames: yield dirpath
This is a little too naive, for multiple reasons:
- The build directory is also in the project tree. This implementation walks also the build directory, which contains links to the modules. The result is that every module will be yielded multiple times, confusing the system.
- The project directory also might contain hidden directories, like
.git
and.metadata
or other.whatever
from various tools. It’s redundant to walk these too, and might even result phantom modules. - For every flavor we iterate over the modules twice. This means that a single
scons
run may callmodules()
four times, with the same results every time. It’s a shame to waste time on file-system operations. - Once the project becomes complex, it may contain deep “sub-projects”. Some of those may be C/C++ projects, and some may be website projects. There’s no point to walk a deep sub-tree that has no SConscript files.
Improvements to the naive implementation
Some of the improvements take advantage of on-the-fly os.walk
recursion manipulation. I use it here to prune the walked directory tree to skip sub-trees. If you’re not familiar with this feature, read this post I wrote about it.
Skipping the build directory and other hidden directories
By checking if dirpath
is the build directory, or a hidden directory, I can prune os.walk
to skip it:
def modules(): """Generate modules to build (directories with a SConscript file).""" for dirpath, dirnames, filenames in os.walk('.'): if '.' == dirpath: dirpath = '' if os.path.normpath(_BUILD_BASE) == os.path.normpath(dirpath) or os.path.basename(dirpath).startswith('.'): dirnames[:] = [] elif 'SConscript' in filenames: yield dirpath
Walking just once
By caching the results of the first os.walk
, I can generate consequent results from the cache instead of repeating the walk:
_CACHED_MODULES = list() def modules(): """Generate modules to build (directories with a SConscript file).""" if not _CACHED_MODULES: # Build the cache # ... the walk from above ... # Yield modules from cache for module in _CACHED_MODULES: yield module
Limiting recursion by depth
It’s simple enough to count path separators and prune the tree to limit the depth:
MAX_DEPTH = 7 # or None to disable the limit def modules(): """Generate modules to build (directories with a SConscript file).""" for dirpath, dirnames, filenames in os.walk('.'): if MAX_DEPTH and MAX_DEPTH > 0: depth = dirpath.count(os.path.sep) if depth >= MAX_DEPTH: dirnames[:] = []: if 'SConscript' in filenames: yield dirpath
Note that when depth == MAX_DEPTH
, the current directory is processed (because it’s at the allowed depth), but sub-directories are not even walked. This means that the condition depth > MAX_DEPTH
should never happen.
Limiting recursion with “stop markers”
If we agree that the existence of a file named .noscons
indicates that the directory should be skipped, it’s straight forward to implement it:
def modules(): """Generate modules to build (directories with a SConscript file).""" for dirpath, dirnames, filenames in os.walk('.'): if '.noscons' in filenames: dirnames[:] = [] elif 'SConscript' in filenames: yield dirpath
Final implementation – combining and generalizing improvements
The stand-alone improvements I presented above are clear and specific. My final implementation combines all of them, making them more generic on the way.
# List of cached modules to save processing for second call and beyond _CACHED_MODULES = list() def modules(): """Generate modules to build. Each module is a directory with a SConscript file. """ if not _CACHED_MODULES: # Build the cache def build_dir_skipper(dirpath): """Return True if `dirpath` is the build base dir.""" return os.path.normpath(_BUILD_BASE) == os.path.normpath(dirpath) def hidden_dir_skipper(dirpath): """Return True if `dirpath` last dir component begins with '.'""" last_dir = os.path.basename(dirpath) return last_dir.startswith('.') for module_path in module_dirs_generator( max_depth=7, followlinks=False, dir_skip_list=[build_dir_skipper, hidden_dir_skipper], file_skip_list='.noscons'): _CACHED_MODULES.append(module_path) # Yield modules from cache for module in _CACHED_MODULES: yield module def module_dirs_generator(max_depth=None, followlinks=False, dir_skip_list=None, file_skip_list=None): """Use os.walk to generate directories that contain a SConscript file.""" def should_process(dirpath, filenames): """Return True if current directory should be processed.""" for skip_dir_func in listify(dir_skip_list): if skip_dir_func(dirpath): return False if intersection(filenames, file_skip_list): print 'scons: |- Skipping %s (skip marker found)' % (dirpath) return False return True top = '.' for dirpath, dirnames, filenames in os.walk(top, topdown=True, followlinks=followlinks): # Find path relative to top rel_path = os.path.relpath(dirpath, top) if (dirpath != top) else '' if rel_path: if not should_process(rel_path, filenames): # prevent os.walk from recursing deeper and skip dirnames[:] = [] continue if max_depth: # Skip too-deep directories max_depth = int(max_depth) assert max_depth > 0 # Calculate current depth relative to top path depth = rel_path.count(os.path.sep) + 1 if depth == max_depth: # prevent os.walk from recursing deeper dirnames[:] = [] if depth > max_depth: # shouldn't reach here though - shout and skip print 'w00t?! Should not reach here ... o_O' continue # Yield directory with SConscript file if 'SConscript' in filenames: yield rel_path def listify(args): """Return args as a list.""" if args: if isinstance(args, list): return args return [args] return [] def intersection(*args): """Return the intersection of all iterables passed.""" args = list(args) result = set(listify(args.pop(0))) while args and result: # Finish the loop either when args is consumed, or result is empty result.intersection_update(listify(args.pop(0))) return result
This code is divided between site_scons/site_config.py (the modeules() function) and site_scons/site_utils.py (the rest). I chose to split it this way, because I wanted site_config.py to be minimal, containing only project configuration. The other functions are utility functions that modules() happens to use.
Some notes about generalizing the improvements:
- Instead of hardcoded
.noscons
stop marker, I support a list of marker names. - Instead of hardcoded directories to skip, I support a list of skip-functions. For every directory, each function is called with the directory path. The directory (and sub-tree) is skipped if any of the functions returns True.
Demo
The default run is exactly as it was before:
itamar@legolas sconseries (episodes/08-discovery) $ rm -r build/ itamar@legolas sconseries (episodes/08-discovery) $ scons scons: Reading SConscript files ... scons: + Processing flavor debug ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Writer ... scons: + Processing flavor release ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Writer ... scons: done reading SConscript files. scons: Building targets ... ... snipped ... scons: done building targets.
Lets try out the other features as well!
To do it, I created a stub module:
itamar@legolas sconseries (episodes/08-discovery) $ mkdir -p Foo/Bar itamar@legolas sconseries (episodes/08-discovery) $ touch Foo/Bar/SConscript itamar@legolas sconseries (episodes/08-discovery) $ scons scons: Reading SConscript files ... scons: + Processing flavor debug ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Foo/Bar ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Foo/Bar ... scons: |- Second pass: Reading module Writer ... scons: + Processing flavor release ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Foo/Bar ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Foo/Bar ... scons: |- Second pass: Reading module Writer ... scons: done reading SConscript files. scons: Building targets ... scons: `.' is up to date. scons: done building targets.
Depth limit
Changing max_depth
to 1 and running scons
:
itamar@legolas sconseries (episodes/08-discovery) $ scons scons: Reading SConscript files ... scons: + Processing flavor debug ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Writer ... scons: + Processing flavor release ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Writer ... scons: done reading SConscript files. scons: Building targets ... scons: `.' is up to date. scons: done building targets.
As expected, Foo/Bar is not processed.
Stop marker
Changing back max_depth
to 7, and creating a stop marker file:
itamar@legolas sconseries (episodes/08-discovery) $ touch Foo/Bar/.noscons itamar@legolas sconseries (episodes/08-discovery) $ scons scons: Reading SConscript files ... scons: + Processing flavor debug ... scons: |- Skipping Foo/Bar (skip marker found) scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Writer ... scons: + Processing flavor release ... scons: |- First pass: Reading module AddressBook ... scons: |- First pass: Reading module Writer ... scons: |- Second pass: Reading module AddressBook ... scons: |- Second pass: Reading module Writer ... scons: done reading SConscript files. scons: Building targets ... scons: `.' is up to date. scons: done building targets.
As expected, Foo/Bar is skipped. Also, the fact that the skip message appears only once indicates that caching works as expected!
Summary
Once again, this episode brings no change in functionality, but makes the build framework more flexible and developer-friendly.
The automated module discovery, as described and implemented, solves the double-maintenance issue in managing modules. The modules discovery functionality provides a robust configurable module scanner, that can be easily extended to cover more scenarios that I didn’t think about here.
For instance, the implementation doesn’t include these ideas, but they can be easily added:
- Taking the value for
max_depth
from a command line flag instead of hardcoded value. - Maintaining a list of modules that should not be processed and built by default, unless a specific command line flag is passed. This can be useful, for example, if you maintain a collection of codelabs in the main project tree, but don’t want to build them by default.
I’ll leave these as an exercise for the dedicated reader 😉 . If you implement it, please share back!
The final result is available on my GitHub scons-series repository. Feel free to use / fork / modify. If you do, I’d appreciate it if you share back improvements.
See the scons
tag for more in my SCons
series. Upcoming episodes that may interest you include supporting SCons help / quiet, and propagating required libraries.
Leave a Reply