This module provides support for import of data in other formats (supported by extension modules) into nodes to be added to a store.
With various node properties, Lethe can be used to organize your browser bookmarks or the bibliography of your article. With this module and node importer extensions, you can add new entries to such a site directly from the browser files or BibTeX databases.
Why name it ‘node importers’? Support for importing whole wikis with their histories should be implemented in future, while this interface won’t be appropriate for it. Complete nodes are imported, with no previous history.
To implement your own importer extension, add a module to the lethe.ext.node_import package (and update __all__ there) with a subclass of Importer or one of its abstract subclasses defined in this module. To use the importers, use get_importer, types or, for complex tasks, ImporterRegistry.
Base class for node importers.
An instance handles imports into a single store, while it doesn’t save the nodes.
Use get_importer instead of instantiating these objects.
A string or a compiled regular expression describing the names of files that this importer supports. The string is the file name without directory names, the regular expression is matched on the whole path given.
Iterate new uncommitted nodes for entries of the import file at path.
lethe.datastore.Store instance to which the imported nodes belong.
A human-readable name for this import format.
An importer that can import nodes from a string.
Subclasses must implement nodes_from_string.
Iterate new uncommitted nodes for entries in the string.
An importer that imports nodes from an XML document.
Subclasses must implement nodes_from_xml.
Iterate new uncommitted nodes for entries in an XML document.
|Parameters:||tree – an lxml.etree.ElementTree instance representing the document with entries to import|
An importer supporting SQLite database files as input.
Subclasses must implement nodes_from_db.
Manage node importer extensions.
Add all importers from lethe.ext.node_import.
Register an importer.
Return an Importer instance for use with the file at path.
The nodes are not imported by this function, for this you need to call Importer.nodes_from_path or a method of its subclass on the returned object.
ImporterRegistry.get_importer for the default registry providing all lethe.ext.node_import extensions.
ImporterRegistry.types for the default registry providing all lethe.ext.node_import extensions.
The lethe.ext.node_import package contains modules implementing specific importers. Don’t import them directly: lethe.node_import.get_importer should find an appropriate importer for the requested file.
Node importer for Mozilla places.sqlite.
See <https://developer.mozilla.org/en-US/docs/The_Places_database> for documentation of the database format used. Only bookmarks are imported, history and annotations are not used. Favicons are not imported.
Each bookmark is imported into a single node. Metadata is represented using props and descriptions are stored as node text.
the functionality of this module is very limited to what the author needed, with some understanding of the format used it could be made more generic and useful for other ways of using bookmarks
Node importer for places.sqlite.
Node importer for a custom recfile record type.
The input syntax is the same as supported by GNU recutils.
Bookmarks are read from a record set compatible with the following type:
%rec: Bookmark %key: URL %mandatory: Title Folder %unique: Title %unique: Folder %type: Visited date %type: Tag line %type: Language line %sort: Title
Records like this one can be imported:
URL: https://www.gnu.org/philosophy/bsd.html Title: BSD License Problem - GNU Project - Free Software Foundation (FSF) Folder: Root Tag: bsd Tag: free software Tag: licensing Visited: 2012-08-30 14:10:00.669158
For simplicity, the parser skips record descriptors and uses all records that have the URL field. The Description field is used for node text. All unknown fields are imported as properties with lowercased keys. Visited dates are supported only with the above format, they are assumed to be UTC.
This code should use GNU recutils Python bindings instead of the custom parser that it has now.
Importer for recfiles.