Sunday, January 27, 2008

2008-01-27 Sunday

Mark Proctor was very kind to post a link on the Drools blog to my Code Camp V3.0 presentation - but I need to clarify something, and so I sent him the following email:


Just a clarification on your blog post:

I was quoting the Drools blog:

not my success story :)

My experience giving the presentation was helpful in teaching me humility - there is so much that I still have to learn about Drools - and the need to develop a deeper understanding of the core, examples, etc.

I also was very fortunate to get some meaningful feedback on my presentation via a blog comment left by woolfel:

I saw your presentation on drools blog and thought I drop a note. I also left a comment on mark's blog entry. On slide 52-53 of the presentation, it makes statements that could be misinterpreted and could lead users to think RETE scales linearly with respect to ruleset size. I hope that wasn't your intent, since that is completely wrong.

What affects scalability is the RETE topology. The number of rules often do not have any affect on performance. It depends on the actual rule and how many nodes it potentially involves. I have several dozen entries on my blog explaining it in detail.

I've had debates with many people in the past about this common misunderstanding. I've seen JRule and Blaze consultants with 3-5 years of experience make these incorrect claims and give RETE a bad name.

If you want an invite to my blog and learn exactly what affects RETE performance, send me an email (snipped). In case you're wondering, I'm the author of Jamocha and I contributed code to drools which was ported by mark to drools 3. (an open source rule engine)

To which I replied via email and comment:

Thanks for your help in clarifying my understanding of the performance impact of ruleset size.

I believe the most recent update of the presentation may have renumbered the slides you referenced (perhaps slide 58-60 are now the ones you meant?)

Certainly I am still learning about Drools, so I appreciate your feedback.


1 comment:

woolfel said...

In the updated pdf, it is the pages you mentioned. The RETE wall limitation is an artifact of a given implementation. In the past, there were implementations of RETE that kep track of all the variables globally. If the implementation uses late binding, the issue with large datasets isn't applicable. For example, JESS, drools and jamocha use late binding. think of it another way, if we have 200 rules and 5 bindings per rule, that's means a potential of 1000 bindings to manage. If the design actively manages the bindings globally, we can see that as the dataset grows, the space requirement grows rapidly.

Using late binding, that issue doesn't exist, since the binding is local to a node, it doesn't increase memory consumption. Hope that helps explain the RETE wall issue.