mostscriptorium

The details usually matter... Sean Upton's bucket-o-bits.

Folder UI: GSoC project update (6/26)

written by admin, on Jun 27, 2009 4:26:00 AM.

For the past weeks, I have been working on a GSoC project for folder listings and faceted folder navigation in Plone. Here are some updates of how things are going.

Big picture stuff

  • I am focusing on faceted navigation of folder listings, since the "dashboard" concept that was part of my original plan seems less useful for a general audience after careful consideration (and good feedback).
  • Faceted search/browsing of folder listings is a big part of making folders with many items reasonable, pleasing to use. This is the primary goal and majority of work for active development of my project.
  • My work (disclaimer: bits and pieces of disjointed work, this is still early) can be found in plone.app.folderui in Plone svn.
  • Over the past few weeks, I have been working on back-end work for how facets and queries are structured and executed.
  • Over the next few weeks, I expect to be focused on building a user-interface for faceted search including a new listings view and viewlets for facets and for a query summary to be displayed at the top of the listings. The goal here is to have something simple be end-to-end functional by early July.

What has been done

  • Interfaces and components for representing individual search filters and composite queries in an implementation-neutral way.
  • An HTML/JavaScript functional mockup (svn co this to test. YMMV, requires jQuery, tested in Firefox 3) of multi-selection within a single facet was created.
  • Interfaces for persistent/configuration components that are used to render facets and control behavior (query/filter generation).
  • Code for dealing with date ranges; relative date ranges like "past" and "this month" can be created as factories for applied date ranges relative to datetime.now() (or any other datetime). Date ranges are transformed into query filter components with an adapter (includes doctest).
  • A multi-adapter that wraps an iterable mapping interface around LazyMap catalog results and the catalog providing them; keys are RIDs, (lazy or iterated) values are brains -- this allows access to catalog record ids for every query to eventually use in efficient calculation and caching of set intersections as part of the faceted search system. None of the benefits of lazy sequences are lost. (includes unit test).
  • A query runner takes the implementation-neutral query specifications mentioned above, and transforms them into AdvancedQuery composite query objects for purposes of evaluation against portal_catalog. This query runner component returns an iterable result mapping that can either be used via iteration (itervalues()) or lazy evaluation (values()), and returns catalog record ids for a query via keys(). This component has not been integration tested, so it needs more work over the next few days.

Next on the radar for the coming 7-10 days:

  • Integration tests of the query runner for some simple queries mimicking faceted search selections against sample content and catalog results.
  • A configuration backend (CMF tool called portal_facets) to persist facet configuration (what facets show up where and for whom) and provide a registry of available facets queried for a context of user-role and path. A configuration UI for this would be an eventual goal.
  • An enhanced listing view and viewlets / UI for each facet.
  • A component to construct a query component from URL querystring (other eventual state representations for the whole of the viewed query may in the future include JSON for JavaScript enhancements). A viewlet will be created to display the information on the current query above the listings/results.

As the UI starts to come together, even in crude form, I will try to blog more with screenshots and possibly a screencast. Once folks can start seeing the results of this work, I would love to get UI feedback.

Plone GSoC: re-thinking folder navigation, part one

written by admin, on Jun 5, 2009 6:56:00 AM.

A core part of my "large folders" GSoC project is about finding a needle in a bigger haystack. Two enhancements to the navigation of folders in Plone would greatly help in systems and folders containing large numbers of items:

  1. Faceted navigation: filtering through item listings constantly narrowing to what you want by criteria. This is search-driven, but feels like browsing.
  2. A folder dashboard: faceted navigation requires a starting point, and a dashboard might be a place you first land when you click into a folder.

I propose replacing folder listings with this dynamic duo of a dashboard and robust faceted navigation for the contents of a folder and its subfolders. This post describes with three mockups what I think this could look like. I am very interested in feedback.

The starting point: a dashboard

A dashboard is a central place displaying key information and links, navigation to dig into the information. Put another way, a dashboard is a visual container for a variety of general-purpose widgets that in sum provide a good general user experience. The dashboard for a folder may be configurable (that is not a key consideration yet).

What matters is that the dashboard presents some, but possibly not all of the information a user needs to see, but it provides themost important links, information, and tools to a user trying to manage and find content in a folder.

What might a dashboard for a folder in Plone look like?

Dashboard mockup

Right-click and select "view image" to see at full size.

This dashboard centralizes widgets for:

  • Text search of full text or title, including or excluding subfolders.
  • A way to immediately jump to an item by typing in its unambiguous id.
  • A summary of folder, subfolder statistics and quantitative information on what is contained.
  • A listing of N most recent items.
  • A listing of "My most recent items" (by owner)
  • A view of folder metadata ("content") if applicable.
  • A tag cloud for keywords and/or tags, if configured/applicable.
  • Multiple entries into faceted navigation to filter into items:
    • From the text search.
    • From summary of item type/status quantities.
    • From a tag cloud.
    • From a navigational link labeled "find content" at the top.
    • By clicking to view more results from the "most recent" items widget.

Digging deeper: Faceted listings

Faceted navigation for a folder requires the ability to view resulting listings of items and have filters ("facets") for narrowing that list down. Both listings and the filters must be present in the UI on most screens. I propose a two-column layout within the view of the folder object:

Listing mockup

Facets, expanded

If expanded, the navigation column above might look something like this (annotated mockup):

Filter mockup

Some of these facets/filters (many, in fact) would be implemented as catalog queries, but need not necessarily be exclusively dependent on the catalog. This may require the ability to perform ordered, heterogeneous set intersections on a common set of unique ids (I would propose using five.intid ids).

I am suggesting that the default view for folders be a combination of a dashboard and a faceted listing view. This does not address what to do with the folder contents tab (I will suggest in a later post my thoughts which include keeping folder contents as a secondary way to navigate the folder, but add a small "quick box" above the folder_contents listing table, to include a search prompt and summary information with hyperlinks to the dashboard and faceted navigation.

More to come including thoughts on implementation soon. Comments welcome!

Plone GSoC: giving large folders a better experience

written by admin, on Jun 1, 2009 9:23:00 PM.

I am participating in Google Summer of Code for the Plone Foundation this year, working on a project to enhance the experience of interacting with folders in Plone when a large number of items are put into it. Currently keeping folders with thousands of items yeilds limitations worth fixing.

One of the key changes to facilitate this better experience is faceted navigation to replace/compliment current folder contents/listings views. Another key enhancement would be a folder "dashboard" to replace the folder listing with a more reasonable initial view into a folder that potentially has many items.

More here soon! I will be posting updates here on my weblog periodically to describe what I am trying to accomplish and how, and will use this as one venue for community feedback. This week, I expect to post some mockups and ideas, your feedback is welcome in comments.

This blog ported to Zine!

written by admin, on Jun 1, 2009 9:07:00 PM.

I finally managed to get motivated enough to transition from Wordpress to Zine. So far, a very good experience setting up and importing. I set up Zine in a virtualenv, installed, and modified the wsgi application file to append the virtualenv site-package to the site path. After that, I just had to figure out editing the blog_url variable in the configuration editor to get things to work as I expected over https for admin, http for public access.

Reading most trade news (is a total waste of time)...

written by admin, on Mar 24, 2009 2:58:32 AM.

Every once and a while I wonder why I subscribe to certain RSS feeds. Today I'm wondering just that about Editor and Publisher Technology News feed [rss]. A rough guess is that 19 of 20 items posted are just regurgitations of vendor press releases.

Despite the challenges of our business, reading this stuff gives you the impression that there is only news about vendors doing this or that. Yawn. This archetype is just metonymy for trade press writers who treat writing about the industry like they are writing sports news. Vendors battle; consolidation looms, and only the best make the playoffs. Documenting executive moves among vendors is sort of like telling who won the all-star slam-dunk contest. Only no vendor is going to get anyone real (other than its own staff) as excited about what they do as the vast quantity of so-called news would make you think. If we want real news, lets hear about what innovative tech work is happening at newspapers by their staff and collaborators!

Revisiting virtualenv path tricks

written by admin, on Sep 8, 2008 7:20:20 AM.

I wrote here about adjusting the virtualenv activation script to support adding source eggs/packages to the PYTHONPATH. I no longer advocate this approach, and have found a nicer alternative. Here are the downsides:

  • PYTHONPATH environment variable is limited to 1024 bytes on Unix, 256 bytes on Win32. This is poorly suited to any reasonable sized collection of development packages.
  • Every time you add or rename a package (or source directory in which it is maintained), you end up needing to re-activate the virtualenv environment to get the desired effect in the PYTHONPATH variable.
  • My bash snippet is decidedly Unix-only; I want to support cross-platform development, occasionally requiring some Win32 Python development. I'm too lazy to port my bash snippet into NT/DOS batch file code.

So... what to do? I decided to use .pth files, but I did not want to hardcode each path. Here's what to do:

  1. Create a $VIRTUAL_ENV/src directory. Put packages or folders containing (namespace) packages in this directory.
  2. Create a source.pth file in your virtualenv site-packages directory containing the following single line: import source_path
  3. Add a python file source_path.py to your site-packages directory containing the following:
    
    import sys
    import os
    
    if 'VIRTUAL_ENV' in os.environ:
        BASEPATH = os.environ['VIRTUAL_ENV']
    else:
        BASEPATH = os.environ.get('PYTHON_BASEPATH', '')
        if not BASEPATH:
            BASEPATH = os.path.abspath('')
    
    BASEPATH = os.path.join(BASEPATH, 'src')
    
    fullpath = lambda rel: os.path.join(BASEPATH, rel)
    issrcdir = lambda rel: os.path.isdir(fullpath(rel))
    srcdirs = [fullpath(rel) for rel in os.listdir(BASEPATH)
        if issrcdir(rel)]
    
    sys.path.append(BASEPATH)
    for dirpath in srcdirs:
        sys.path.append(dirpath)
    
    

How this works, benefits:

  • Automatically scans the src directory each time the python interpreter is started. No need to re-activate.
  • No need to explicitly list many hard-coded absolute paths in a .pth file.
  • Shares nicely: you can commit this technique to a subversion repository, and multiple developers can use this technique and these instructions without customizing to their path environment.
  • Has no length limits like PYTHONPATH.
  • Tested across platforms: unix and win32.

The regressive GOP Palin "Executive Experience" narrative

written by admin, on Sep 3, 2008 3:28:05 AM.

It strikes me as odd (watching cable news RNC coverage) that the answer that Gingrich and others are repeating this new party line that Palin has "executive experience" -- something Obama apparently lacks, they say. On this metric, Mr. McCain is greener than the over-watered grass at your local golf course. This experience narrative is an infinitely regressive diversion -- most important policy and many significant changes must happen through legislation. On the legislation and positions that matter, Obama is right. Apparently McCain could challenge Reagan over Beirut, but couldn't challenge Bush on Iraq -- give me a break on this "maverick" facade.

Tweaking virtualenv activation PYTHONPATH

written by admin, on May 21, 2008 4:01:16 AM.

A technique I've been using, and finding useful is creation of a directory for source packages I'm developing within the root of a virtualenv application environment. But in doing this, I need to be able to add namspace packages I add to that source directory each to the PYTHONPATH automatically; doing this was as simple as adding a few lines of bash code at the end of the activation script:

SRCBASE=/home/sean/code/pyocm/src PYTHONPATH=$PYTHONPATH for i in `find $SRCBASE -maxdepth 1 -type d | sed -e "s/[.][\/]//"`; do CODEDIR=`echo $i | sed -e "s/[\/][.]//"` PYTHONPATH=$PYTHONPATH:`echo -n $CODEDIR` done export PYTHONPATH=$PYTHONPATH

This saves me from adding these to the local site-packages, keeping development packages separate from dependencies I've installed from elsewhere (e.g. eggs from Cheeseshop).

Refine your search!

written by admin, on Mar 31, 2008 3:19:00 PM.

In Zope/Plone applications, one often asks the catalog for an intersection of results from several indexes. For example, suppose I want all items where all of the following are true:

  1. Subject field is “sports”

  2. SearchableText matches the keyword query “stadium disab*”

  3. Path is equal to '/foo/bar'

There are three typical ways to achieve this in a user-interface: (1) an advanced search form/page, (2) collections (saved searches), and (3) drill-down filters within faceted navigation.

This writeup is interested in the third case – faceted search: why it is useful, why it is difficult, and how something effective might be implemented in Plone.

What facets are:

  1. Aspects of a set of search results, usually a sub-set of results.

  2. Often, clickable links to refine your search within a larger set of results.

With faceted search navigation, filtering existing search results is the name of the game. If I search for a term (often a full text search), it may be helpful to ask me whether I want to view only known subsets of the search results I am presented. For example, in a search for “Lincoln,” it may be helpful to see clickable filters beside the results that let me choose to search only the Automotive category, or the History category. It may also be helpful to limit my search to items published within the last year.

Faceted search is often used in a variety of web applications, but is especially useful for large collections of news and information, product reviews, and local search (entertainment guides, business directories). The more structured your metadata, the more likely faceted navigation is helpful to you. With free services like Calais popping up, getting better metadata on even unstructured content is easier to automate.

Unlike navigating using taxonomies within controlled vocabularies, there is no distinct linear hierarchy to this. Applied in Plone, most applications choosing to use this pattern will treat each index as a facet and rely on the portal_catalog machinery to return the results. It is a useful oversimplification to say such a navigation strategy is the iterative set-intersection of multiple results by end-users. The same is true for faceted navigation of search results, or simple collection navigation.

What might this look like? Figure 1 shows possible click behavior and permutations of three different search criteria.

Figure 1

Figure 1 - faceted navigation results in a need for result set intersections. We might want to cache these...


Faceted navigation is more like browsing than searching, most of the time. Clickable filters are easy. But with the ability to hyperlink comes the ability to abuse it. And with good metadata often comes large vocabularies of possible choices. There are two good UI guidelines apply here:

  1. Do not show links to facets/filters that have no results, even if the filter in question is a popular term in your vocabulary of possible choices. If it is not germane to the results already presented to a user, omit the useless link from your navigation.

  2. Related: show the number of results in each facet link in the navigation. If you are searching automobiles, and have a facet for color, if the results page the user is already on lists seven red cars, thirty-two blue cars, and three black cars, show those numbers within the respective links. If there are zero purple cars in the result sets, just omit the link labeled “purple” per #1.

Figure 1

Figure 2 - faceted navigation. This particular example is an application some folks I work with built using Django and PostgreSQL with stored procedures used to offload some the work of creating link set counts from the application.

Both of these common-sense user interface constraints pose challenges. Running a single search result that intersects several indexes is one thing (not a problem), but it turns out getting those counts for all the various permutations of facet choices is tricky, and expensive. Getting result counts means querying the catalog, and if you have dozens of clickable facet links on your pages, and you are filtering through, say 100,000 results, you have a problem. Actually you have dozens of problems, each as big as the result set they contain. There's one good reason one might need a memory watchdog.

To add insult to injury, what if all this expense tied up resources on your application server, sharing such burden with the threads rendering your pages? Such collapsing of burdens is hard to escape the way Zope is designed (index operations happen in the same thread and on the same hardware as the rest of the application's execution) – a possible solution is an external catalog, much like Nuxeo used with CPS via NXLucene, and possibly like what will soon be possible with Plone and Solr (collective.solr and colletive.indexing).

This isn't meant to sound bleak. I built a system like this that scaled okay with 30,000 items (throwing hardware at the problem). Trying to make the same system work for nearly 200,000 items (local yellow pages listings) never succeeded into production. I write this because I've seen what does (to some extent) and does not work, and have a ideas on how the situation can be improved. And if those do not work, we can steal^W borrow ideas from Solr [more].

This problem does not have easy answers, but there are ways to improve this, including caching the metadata for each facet permutation (read: set intersections) using an out-of-band cache-warmer (possible an external thread talking to a cache that is not thread-local?). Asynchronous invalidation notifications to such a cache could happen within writes to Zope-based content (read: event subscribers and a message queue). Such solutions help partition the work of away from an in-process bottleneck inside a Plone-based system.

In a follow-up post, I will detail some more specific ideas I have about addressing this area. There is more potential than down-side, we just need to be careful of the "gotchas."

Faceted search screenshot

written by admin, on Mar 31, 2008 2:51:23 PM.