Coding Style

A brief description of things we strive at, more or less unsuccessfully.

1. Packaging

We use the classical GNU Autoconf/Automake approach, for tutorial see e.g. Learning the GNU development tools or the AutoBook.

2. Modules

Invenio started as a set of pretty independent modules developed by independent people with independent styles. This was even more pronounced by the original use of many different languages (e.g. Python, PHP, Perl). Now the Invenio code base is striving to use Python everywhere, except in speed-critical parts when a compiled language such as Common Lisp may come to the rescue in the near future.

When modifying an existing module, we propose to strictly continue using whatever coding style the module was originally written into. When writing new modules, we propose to stick to the below-mentioned standards.

The code integration across modules is happening, but is slow. Therefore, don't be surprised to see that there is a lot of room to refactor.

3. Python

We aim at following recommendations from PEP 8, although the existing code surely do not fulfil them here and there. The code indentation is done via spaces only, please do not use tabs. One tab counts as four spaces. Emacs users can look into our Emacs Tips wiki page for inspiration.

All the Python code should be extensively documented via docstrings, so you can always run pydoc file.py to peruse the file's documentation in one simple go. We follow the epytext docstring markup, which generates nice HTML source code documentation.

Do not forget to run pylint on your code to check for errors like uninitialized variables and to improve its quality and conformance to the coding standard. If you develop in Emacs, run M-x pylint RET on your buffers frequently. Read and implement pylint suggestions. (Note that using lambda and friends may lead to false pylint warnings. You can switch them off by putting block comments of the form ``# pylint: disable=C0301''.)

Do not forget to run pychecker on your code either. It is another source code checker that catches some situations better and some situations worse than pylint. If you develop in Emacs, run C-c C-w (M-x py-pychecker-run RET) on your buffers frequently.

You can check the kwalitee of your code by running ``python modules/miscutil/lib/kwalitee.py --check-all *.py'' on your files. This will run some basic error checking, warning checking, indentation checking, but also compliance to PEP 8. You can also check the code kwalitee stats across all the modules by running ``make kwalitee-check'' in the main source directory.

Do not hardcode magic constants in your code. Every magic string or a number should be put into accompanying file_config.py with symbol name beginning by cfg_modulename_*.

Clearly separate interfaces from implementation. Document your interfaces. Do not expose to other modules anything that does not have to be exposed. Apply principle of least information.

Create as few new library files as possible. Do not create many nested files in nested modules; rather put all the lib files in one dir with bibindex_foo and bibindex_bar names.

Use imperative/functional paradigm rather then OO. If you do use OO, then stick to as simple class hierarchy as possible. Recall that method calls and exception handling in Python are quite expensive.

Use rather the good old foo_bar naming convention for symbols (both variables and function names) instead of fooBar CaMelCaSe convention. (Except for Class names where UppercaseSymbolNames are to be used.)

Pay special attention to name your symbols descriptively. Your code is going to be read and work with by others and its symbols should be self-understandable without any comments and without studying other parts of the code. For example, use proper English words, not abbreviations that can be misspelled in many a way; use words that go in pair (e.g. create/destroy, start/stop; never create/stop); use self-understandable symbol names (e.g. list_of_file_extensions rather than list2); never misname symbols (e.g. score_list should hold the list of scores and nothing else - if in the course of development you change the semantics of what the symbol holds then change the symbol name too). Do not be afraid to use long descriptive names; good editors such as Emacs can tab-complete symbols for you.

When hacking module A, pay close attention to ressemble existing coding convention in A, even if it is legacy-weird and even if we use a different technique elsewhere. (Unless the whole module A is going to be refactored, of course.)

Speed-critical parts should be profiled with pyprof or our built-in web profiler (&profile=t).

The code should be well tested before committed. Testing is an integral part of the development process. Test along as you program. The testing process should be automatized via our unit test and regression test suite infrastructures. Please read the test suite strategy to know more.

Python promotes writing clear, readable, easily maintainable code. Write it as such. Recall Albert Einstein's ``Everything should be made as simple as possible, but not simpler''. Things should be neither overengineered nor oversimplified.

Recall principles Unix is built upon. As summarized by Eric S. Reymond's TAOUP:

  • Rule of Modularity: Write simple parts connected by clean interfaces.
  • Rule of Clarity: Clarity is better than cleverness.
  • Rule of Composition: Design programs to be connected with other programs.
  • Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
  • Rule of Simplicity: Design for simplicity; add complexity only where you must.
  • Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
  • Rule of Transparency: Design for visibility to make inspection and debugging easier.
  • Rule of Robustness: Robustness is the child of transparency and simplicity.
  • Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
  • Rule of Least Surprise: In interface design, always do the least surprising thing.
  • Rule of Silence: When a program has nothing surprising to say, it should say nothing.
  • Rule of Repair: Repair what you can -- but when you must fail, fail noisily and as soon as possible.
  • Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
  • Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
  • Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
  • Rule of Diversity: Distrust all claims for one true way.
  • Rule of Extensibility: Design for the future, because it will be here sooner than you think.
or the golden rule that says it all: ``keep it simple''.

Think of security and robustness from the start. Follow secure programming guidelines.

For more hints, thoughts, and other ruminations on programming, see our CDS Invenio wiki, notably Git Workflow and Invenio QA.

3. MySQL

Table naming policy is, roughly and briefly:

  • "foo": table names in lowercase, without prefix, used by me for WebSearch
  • "foo_bar": underscores represent M:N relationship between "foo" and "bar", to tie the two tables together
  • "bib*": many tables to hold the metadata and relationships between them
  • "idx*": idx is the table name prefix used by BibIndex
  • "rnk*": rnk is the table name prefix used by BibRank
  • "fmt*": fmt is the table name prefix used by BibFormat
  • "sbm*": sbm is the table name prefix used by WebSubmit
  • "sch*": sch is the table name prefix used by BibSched
  • "acc*": acc is the table name prefix used by WebAccess
  • "bsk*": acc is the table name prefix used by WebBasket
  • "msg*": acc is the table name prefix used by WebMessage
  • "cls*": acc is the table name prefix used by BibClassify
  • "sta*": acc is the table name prefix used by WebStat
  • "jrn*": acc is the table name prefix used by WebJournal
  • "collection*": many tables to describe collections and search interface pages
  • "user*" : many tables to describe personal features (baskets, alerts)
  • "hst*": tables related to historical versions of metadata and fulltext files
- end of file -