Drupal and XML: Looking Forward

In preparation for my trip to Washington D.C. next month, I've begun to develop a module that integrates the CAP XML format (Common Alerting Protocol) with drupal's node, location, google map, category, and CCK modules. Put plainly, the CAP format seeks:

"[to standardize] the content of alerts and notifications across all hazards, including law enforcement and public safety as well as natural hazards such as severe weather, fires, earthquakes, and tsunami. Systems using CAP have shown that a single authoritative and secure alert message can quickly launch Internet messages, news feeds, television text captions, highway sign messages, and synthesized voice over automated telephone calls or radio broadcasts."

Having spent about 8 hours researching and experiementing with drupal, xml, and the CAP alert format, I've come to a few conclusions:

  1. The generic container vs. category paradigm represented by taxonomies, and categories is so 2002. I've got one word for you, just one word... RDF, OWL, and the Semantic web. Okay, that was three... (I just had to sneak in a the graduate reference.
  2. The overall cost of developing, and adopting a robust, full featured XML-to-drupal translator is less than the cost of sitting on our hands. here's one reason why.
  3. Drupal already has a powerful foundation for a magic XML/RDF translation machine. And I'll get to that later...
  4. If we're serious about XML, we should not bother supporting php 4.x. Plainly, PHP 5 is faster, stronger, and it cuts the time it takes to code XML applications by a factor of 20.
  5. The greatest development challenges we'll face are 1) automating the process of generating relational-database friendly schemas (this is an early problem I'm running in to with complex XML formats like CAP), 2) effectively dealing with the myriad flavors of XML -- which are more numerous than the sects of christianity. 3) Enabling humans to have "just the right amount" of control over how XML fields map over to the drupal database, and how various relationships are handled -- without making the module needlessly time consuming, frustrating, and annoying.
  6. A drupal with solid XML support would be, for all practical purposes, the ultimate 3rd party service-to-website integration tool. Talk about dropping a bomb on our competition....

Battleplan for a drupal XML toolkit

From the start, I can say that I don't forsee an XML toolkit making its way into the drupal core any time soon. This is firstly because we'd want to build it for PHP5, and PHP5+ only (nay sayers can try php5's simplexml functions, but until then, I refuse to listen to their nays). Secondly, a great deal of the challenges have already been solved by other modules. Namely:

Modules to Integrate

  • CCK: to handle run of the mill fields, and data (e.g. book abstract, reviews, links... any data that is unique, and doesn't need to be related to other nodes with data. The main challenge here is finding a way to interact with CCK that a) isn't prohibitively ugly, b) will remain reasonably stable with CCK's onging development.
  • Category, Relationship?, Event?, (CiviCRM?) [oh where oh where would it end?] -- Category module is by far the most suitable existing workhorse for this fairy tale toolkit's relationship/organization function. However, by most suitable, I really mean there's no alternative that is less ghashly. I will discuss category.module's shortcomings to this particular challlenge later. Relationship.module is an extremely advanced, and promising development. I spent about 2 hours working with it today, and its by far the closest drupal has gotten to getting in bed with the semantic web. As advertised, it supports an organizational system that not only maps what words and numers relate nodes together, but rather HOW, and WHAT makes up the relationship. However, in the short term (which, to my horror, it the timeframe that I will be taking on this challenge), it simply won't work. Number one, its built against php4, not php5. Its also not stable with 4.7 (or php5?). Eitherway, I'm sad I won't get an opprotunity to use it. Don't take this as me talking trash. On the contrary, I wet my pants when I began to see the possibilities it afforded. And I'm happy to report that I'm not a chronic pant wetter. Now, finally CiviCRM -- I won't be messing with this one yet. Mapping out relationships between people is not on my three week race to building this sucker (or the prerelease demo version) -- but I will keep it in mind.
  • Location/Gmap: By enabling location-specific XML data to take advantage of prexisting location-related modules, we could ensure ourself to be among the "cool kids" at this, and early next year's tech conferences. (and we won't survive unless we begin to make our location functionality practical... in terms of future worth, and present demand location-based-relationships are to AJAX as reality is to fantasy :-) ).
  • Views -- All the cool kids are talking about model, view, controllers -- I'm not sure which the views module fits into because it appears to do all three. Nevertheless, it offers the myriad number of presentational options that drupal will need to venture into the complex world of XML. Not to mention, its nicely supported by Thank you Earl!

Modules to Build

  • XML Receiver -- A two part module, similar to the aggregator module in that it handles defining new XML feds, but either via static file, or via web. More importantly, however, it needs to be able to read an XML schema, and build a CCK node type from it. Frankly, I don't know whether CCK is up to the task, but I'll find out! Once the node type is built from CCK, it updates content on cron runs like the aggregator.
  • XML Broadcast -- Translates XML based nodes into a) email alerts, via subscriptions module, b) RSS feeds, c)enables easy import/export of user defined schemas -- effectively, I want to open source IA implementations in drupal. Why stop at code?
  • XML Translator -- This module, or directory of modules, or giant library of incs will handle the integration of the data into category, location, (views?) or what have you. The module would be unweildly if it fully relied on CCK, category, and views. I think we're at a good point to discuss why:

Weak Links Within the Strong Links

Now these weak links in views, CCK, and category are based upon my particular circumstance. Put simply, I'm attempting to built a resource which provides Homeland Security, weather, earthquake, and amber alerts by location, in web-based, email, and RSS formats. The task is daunting: 50 states, each with hundreds of counties (counties are the main building block of this resource). Using FIDS geocode info, this translates to:

  • 158,554 locations, each with
  • a minimum of 4 flavors of alerts -- weather, terrorism, earthquakes, and probably amber alerts
  • each alert has 5 levels of severity, in addition to 10 or more types of alerts...
  • alerts come with a "begin" and "end" time and date -- I will hook this into scheduling module
  • a weblink, which I plan to use as the unique key for the alert. there's more -- I think this example of scale makes a point, however...

A Roast for some of my favorite modules

Category Module

Drupal offers no solution to this kind of task. The best plan I've come up with using existing category.module functionality is to define state, and county as containers, to associate lat/long with a nid by using fips geocodes in a seperate table, and auto generating redundent categories for each 151,554 supported locations... dizzy, sick, and vomiting yet? Yeah... that was my reaction to having over 1,000,000 categories too!

So first of all, let talk about categories. There is a need for the ability to create a set of reusable categories that can be associated with with multiple containers, but limit their domain to their respective containers. For example, I create a severity container that contains the following categories, low, medium, high, severe, danger. Maybe we could use the highly technical name "reusable category" for this beast. I mark this as a category that is to be associated with all content within a "county" container (we need to be able to not only specify the text associated with the container, but begin to classify the containers themselves. This mark would also carry a special instruction to limit a view of those various categories to THAT container, thus, clicking on the category "severe" wouldn't return severe alerts for all 158,554 locations. Having this capability alone brings the number of categories down from several million, to maybe... 50?, 40?

Now, arguably, I could define these fields via CCK using the text.module. While I "could" do that, I'd loose the ability to have that info available via RSS, and mail alerts. Moreover, its just stupid to classify content using a dead select field. I'm weird in that I think that CCK is best left to dealing with content, and fields, and categories are best left to deal with CATEGORIES, classifications, and groupings. Glad we had a chance to talk this distinction through.

Views Module

One statement: foreach($variable){create a new view filtered, and based upon $variables}. Tack an API, that enables modules to take advantage of these dynamically generated views.

..could you also help me move some heavy objects? I need to be picked up from the airport too. Oh, also, I hate to ask this, but could you loan me a hundred dollars. I'll pay you back next week!

Conclusion

All of this was written down hastily, while I was enjoying my end of day, wind-it-up array of beers. This was a dump of notions, ideas, and sheer wind thrust upon my poor readers. That said, I hope some of it makes sense to at least someone. Mostly because I'm an idiot, and my readers tend to know more than I do. And I love it when I'm corrected, or "re-educated".

Generally, however, I want to leave you with the assertion that XML is the CMS market's most wildly untapped "killer app". Its the gateway to near universal support of 3rd party ASPs. It is a universal language that can connect a website with a myriad of services, directories, libraries, and applications. Most importantly, however, very smart people have been championing the idea of a "semantic web" for 4 years+. Its not happened, because the technology wasn't practical. I'm convinced, that drupal, more so than any other project, is in a position to make the semantic web practical. Its many modules all offer very specific solutions that -- with the right sort of genious and coordination, could make it come together -- perhaps with alarming speed. I insist we are only missing the last few pieces of the puzzle. Food for thought.