#ORF09 making Parallelism Available to Rule Developers Presentation

This entry is part 30 of 30 in the series October Rules Fest 2009

Charles Forgy is talking about how to make rule engines parallel. He talked before about doing things inside the rule engine to make things faster.

Why?

  • Processors speeds are hitting a plateau
  • Vendors are creating multi-core machines
  • To tackle bigger problems, rule engines need to adapt

Internal Parallelism

  • That was the first thing to do
  • Complex but possible
  • There are limits (Amdahl’s Law) because this relates to the match and conflict resolution algorithm, so the gain is not going to be as high as we would hope

Goals of Parallel OPSJ

  • Increase the rate of change of working memory
    • Enable each rule to do more
    • Allow rules to execute in parallel
  • In short make parallelism visible to rule developer

“It is easy to make a parallel rule language. It is not easy to make a parallel rule language that is usable”

There are different approaches to this.

  • Allowing a rule to be defined such that all instances of the rules can be fired simultaneously, but this has to be done with care to avoid unwanted consequences with tremendous pressure on the rule developer or alternatively the rule language has to be extend to let the rule engine deal with it

Using multiple knowledge sources:

  • Need to allow communication between the Knowledge Sources
  • Need to allow the Knowledge Source to “Pause” instead of exiting
  • Requires the use of “Probes” (special type of rule) that know about 2 Knowledge Sources

Also, multiple levels of Working memory on:

  • Insert operations
  • Send packet operations
  • Conditions

Using multiple threads in a Single Knowledge Source

  • How to make changes to architecture to help support this
  • We want to avoid changes to shared data, but this is a challenge since in traditional KBS everything is global
  • Every thread needs to have access to 2 levels, once for its main work and the other in very controlled manner

Next step is parallelism over networks

  • Except for shared memory, everything should support a distributed architecture
  • Network latency will obviously be there but might still be advantageous

Java is not as fast a C/C++

  • Modern processors are very complex and Java does not take advantage of this properly so instructions per cycle are very low
  • He wishes someone would re-write the code generators to take advantage of this.

#ORF09 Distributed Programming with Agents Presentation

This entry is part 29 of 30 in the series October Rules Fest 2009

Mark Proctor is thinking that Agent computing that is coming. He parallels to enterprise architecture design patterns (camel.apache.org).

The same person who wrote those design patterns, is working on a Conversation Patterns to standardize that topic and that is going to standardize agent computing.

An Agent:

  • Can act in an environment
  • Can communicate directly with other agents
  • Is driven by a set of tendencies
  • Is capable of ____________

Agents in Multi-Agent System (MAS) are autonomous, reactive, pro-active and social.

Agents in MAS must

  • move
  • communicate
  • coordinate
  • negotiate

Not all agents have to be mobile, they become stationary agents. But there are advantages to mobile agents.

Agent to agent communication can use Shared Memory or Message passing. Agent communication models are needed and many standards (KQML, FIPA, etc.) have been worked on, many based on Speech Acts. He covered some aspects of that topic. Mark does not really like KQML and prefers FIPA.

Agent computing needs coordination.

The presentation was presenting a lot of information on the topic of agents. Mark then talked a bit on what their intentions are for Drools and agent support in Drools.

#ORF09 Complex Event Processing Models Presentation

This entry is part 28 of 30 in the series October Rules Fest 2009

Charles Young is talking about CEP.

He skims through the first couple of slides because CEP has been covered to some degree in earlier presentations.

CEP requires :

  • Integration with Event Clouds
  • Event Pattern Recognition
  • Event Aggregation
  • Performance & Scalability
  • Tooling

CEP Agents are:

  • Sense
  • Deliver
  • Adapt in
  • Detect
  • Reify
  • Act
  • Adapt out
  • Relay

These are not the whole picture since we need to support multiple sources, etc.

The bulk of CEP technologies today are event stream processing engines.

Two approaches:

  • Stream orientated approach
  • Set-based approach

Some engines use Set-based operators:

  • Stream to view
  • View to view
  • View to stream

There are a variety of ways to use “Select and Consume” strategy. There is a lot of complexity around that. In terms of context is about various ways to handle timestamps, temporal windows, template instantiation.

There are many “Event Pattern Languages”, but the details are not necessarily important, as long as the language supports what you are trying to do. (He lists some).

He then linked Rete to CEP in some detail. Rete can help for the optimization of the filtering portion of event stream processing.

Other issues:

  • Event semantics (immutability)
  • Temporal logic
  • Selection and Consumption
  • and 2 more I did not have time to write down

Why use Rete in CEP?

  • Stream reasoning
  • Bridging the gaps
    • e.g. Event Processing to Business Processes
    • Event Processing to Analytics

He then covered different models to using Rete in CEP

  • Rete before Event Stream processing
  • Rete after Event Stream Processing
  • Hybrid agents
    • Event Stream Processing injecting events in between the alpha network and beta network of the Rete
    • Event Stream Processing injecting at the end of the Rete network

We need to bring Event Processing Networks closer to Enterprise Service Bus.

#ORF09 CLIPS implementation of RETE Presentation

This entry is part 27 of 30 in the series October Rules Fest 2009

Gary Riley is talking about his implementation of RETE in CLIPS.

A couple of years ago, benchmarks for rules engines (Manners and Waltz) showed that CLIPS was not not doing well. He wanted to improve speed although it may have not been the only issue.

He started adding different improvements, although he was thinking that none of it was “new” since it had been done elsewhere before. These changes improved performance a great deal.

CLIPS and Java can’t be easily compared since they have 2 different structures. So comparing a C implementation to a Java Object Oriented implementation is not necessarily “fair” since in C you are working closer to the machine and don’t have the overhead.

The biggest improvement was to hash the Alpha memories. The hash table is statically sized which assumes that the user can adapt the size to the needs of the application.

The then hashed the Beta memories as well. In this case, the Beta memories are dynamically resized. It is much harder for a user to try and figure out how to size these memories. It is possible to turn hashing off. In some benchmarks hashing made no difference, but there are use cases that show that beta memory hashing does provide performance improvements.

The Salience (priority) had some advantages in some cases but is costly when implemented as a linked list and inserting somewhere in the middle. So he created Salience Groups which are basically buckets that hold instances with the same salience.

He then talked about how the “Not” conditional element was implemented.

Asymmetric retracts relates to the traditional approach which takes a more symmetric approach to assertions and retractions. So asymmetric retracts work slightly differently, he referred to Tree based removal and a bottom-up approach for removing partial matches.

He also changed the “Exists” conditional element. It is implemented as a single join. it takes just one partial and you don’t need to keep a count.

He then talk about other miscellaneous improvements to the code.

Benchmarks (specifically Manners and Waltz) are useful but they should not be the only thing that people look at.

His conclusions, still good to code in C. Open Source is Good. We need more benchmarks, not different benchmarks to test other parts of the system.

#ORF09 October Rules Fest Think Tank – Part II

This entry is part 26 of 30 in the series October Rules Fest 2009

As you may have seen from my previous post, trying to capture what was discussed at the Thursday Think Tank of the October Rules Fest 2009, it was a somewhat of a mess…

I think Charles Young captures well many people’s thoughts on what happened in his blog post summarizing the whole day (see last paragraph) at: http://geekswithblogs.net/cyoung/archive/2009/10/30/135855.aspx

To highlight some of his points, next year’s Think Tank should:

  • Be better planned, with specific topics of discussions to address
  • Be moderated
  • Reduce the number of panellists

I give all the credit to James Owen who organized this conference and who’s intentions were probably along those lines, but it didn’t exactly turn out that way.

#ORF09 October Rules Fest Think Tank

This entry is part 25 of 30 in the series October Rules Fest 2009

The panel includes: Dr. Charles Forgy (CF), Mark Proctor (MP), Gary Riley (GR), Carolos Seranno-Morales (CSM), Carole Anne Matignon (CAM), Dr. Jacob Fieldman (JF), Dr. Rick Hicks (RH), Paul Vincent (PV) and Jason Morris (JM).

I will do my best to log the “gist” of what was discussed by each person in turn. This will be a LONG post… I realize that blogging a session like that (of that length and technicality) is difficult (hey, it was my first) so this might not be the best post ever…

——————–

The question: What is needed in the rules based system now and in the future?

MP: He likes to bounce ideas off of other experts to see what others are thinking about. He is talking about features that can be implemented in the next couple.

RH: The difficulty of building these systems and to maintaining them over time. That is why he is focusing on building rules and verifications.

PV: Would not want to drive any car that HE designed. 🙂 But feels that you can offload SOME of the work to end users.

CSM: Decouple the work that a software developer would do from the decisions. Should not expect BAs to understand the underlying technology, but to understand the business logic. The BAs don’t think about syntax. You can create templates by pairing BAs with Experts and then have a bit more control on what the BAs can do.

JM: The challenge is to bridge the gap between the techies and the BAs.

CAM: It is a challenge, but the fact that you can decouple the decisions from the technical stuff is a good thing. BAs take time to get used to being able to change things in production. As they get more familiar and start to take ownership you can expose them to more and more capabilities.

JF: Making things easy for BAs is important. For example a questionnaire for insurance could theoretically require thousands of questions, but you can’t ask customer to answer that many questions. He want to generate rules based on historical data to improve decisions.

CAM: Business worry about generated rules because they have to worry about compliance with regulations.

CSM: Trying to learn from data. 99% of what you will learn we already know. Your data will reflect the trends of today, but things are going to change. Automatically generated rules can be useful but you need to know when to stop using them.

PV: What is needed. It is an interesting indicator to see a UML vendor sponsoring a rules conference. It shows an interesting trend that rules are being recognized as an important component of software development.

CAM: We need a standard for business rules and it would help for the adoption of the technology. We need to collaborate on this to bring it to the next level. it is a slow and long process.

PV: Can we devise higher level models for things like decisions such a PRR. There are challenges. Event the Rule Interchange Format from the W3C.

CY: The PRR guys are getting it right. The problem of W3C seems to be believing that a complex approach for rule exchange.

CM: A rule standard for modeling of rules is an oxymoron because you want your business users to change rules, not go back to a model.

CY: Excellent point. The question is more where are these standards actually going to be useful in practice?

JF: Let people express relationships and rules. Don’t try to impose something. Can we build a system that doesn’t use objects.

PV: Is anyone in the audience interested in using Rule Standards for modeling or interchange?

Audience (Rolando Hernandez): Question for CAM. Users don’t necessarily want to change rules directly. Where is your model of the rules.

CAM: Although you can change a rule on the fly it doesn’t mean you have to. You need to go through a process before a change goes to production. The technology allows you to do that.

CSM: The changes that the business is going to make is made in an environment that is safe and designed for that with governance and management.

Audience (Adam Mollenkopf): (Sorry Adam, I missed your questions…)

PV: No issues from customers about having multiple rule engines as part of a solution.

Audience (David Holz): Do you think there is a distinction between what programmers are looking for in a rule engine and what the business capabilities want out of a rule engine?

GR: Finds that on the surface it may look like the tools have not evolved all that much (Text editor and Clips to debug). Tools seem to be going more to the business users. Each rule is now an item and it is understandable from a versioning point of view, but it creates a separation between the rules where we could see them as a whole before. Definitely 2 audiences.

PV: “Eat you own dog food”. You need to use your own tools to make things work. As a side note there is a text mode in Tibco

CF: I second GR’s statement. A fancy GUI is unpleasant to use.

JM: I “third” that. There are 2 distinct purposes.

CSM: Definitely 2 points of view. But if you want to give control to the business then you need to have the business view. The business views ease communication with the developers since you can have a common point of reference. No translation.

(conversation between JM and CSM edited for shortness… The post will be long enough as it is…)

CAM: Your pointing out the different between the BRE and the BRMS. One is focused on the algorithms, the other is focused on the Governance and Management piece.

Audience: IBM was apparently disappointed to learn that most implementations are not using Inference. Is the cause a lack of evangelisation?

PV: Tibco has observed something similar in the CEP world.

CAM: Execution and Management separation again. Inference engines are not necessarily the best way to resolve problems. The key value is to allow Business Users to manage millions of rules. Give the power to the users, empower them.

CF: There is a very active research projects to implement rules in DB so the DB world does recognize the fact that triggers are too limited.

CSM: Not surprised by the IBM numbers.

Audience (Thomas Cooper): Experts can draw on their experience and what they know and can resolve problems that they have never seen. Do you think we can get to that point of recognizing patterns.

CAM: Predictive modeling are very good at finding patterns in data. So a combination of technologies will be required to fit the needs of the business.

CSM: When you want to adapt, you realize that you are not “coping” as well as before. The difficulty is to detect patterns from what has happened before. The industry is providing a solution to the piece of the puzzle. We will need to close the loop in the offering.

JF: (Goes to the whiteboard and draws a picture and explains). Wants to have a way to mark each decisions as a good decision or bad decision, and then can we use that information to help making new decisions.

JM: If you are familiar with case based reasoning. Looking at previous cases. In almost all situations it requires human interaction. What if you plugged in a rule engine instead of human interaction and it resembles the diagram that JF drew on the board.

Audience: It exist in a product from Mindbox.

[Some of this conversation was very technical and they spoke quickly and I was unable to write all of it…]

MP: Semantics are the same between products. Each engine uses a similar algorithm for conflict resolution for execution control. Drools added ruleflow groups, agenda groups. etc. Rule execution groups could be another way of doing this.

CF: Doing it in the engine is probably more efficient. But doing it in the data is also interesting. Having the engine look at Data about the application and Data about the flow. Some think that a weakness, but I think otherwise.

CF: In the 70’s we used a special kind of fact called a Control Fact. First condition is the control condition. All you need is the Control Facts. You don’t need ruleflow.

MP: I’m trying to get to a common terminology. “Syntactic sugar” to let users do this declaratively. There is a disconnect in terminology. Simplify things to create Design Patterns with common vocabulary.

CF: (talking about the technical implementation) It’s not difficult, you just do it.

[Long technical conversations completely edited out…]

CY: How do we make our systems smarter. How to better compose all of this stuff (he referred to a lot of technologies) together? I think there a ways of doing it but can’t talk about it.

CAM: There is definitely a need for these different technologies to collaborate (not just work side by side). These kinds of collaborations will be required in the next 10 years.

[I stopped taking notes after that]

#ORF09 Business Rules in the Cloud Presentation

This entry is part 24 of 30 in the series October Rules Fest 2009

Carlos Serrano-Morales is talking about how the Internet world is going to affect the AI world.

Some transformation forces:

  • The Economy
  • Big Data

Competitive survival is driven by the ability to anticipate crisis situations as well the ability to react swiftly to it.

There is an ever increasing amount of data with increasing complexity that we need to deal with. How do we leverage that information?

The cloud is a resource that you can use on an elastic basis. You use what you need. Cloud computing is leveraging the cloud. Cloud computing is not a fad, it is currently at the top of Gartner’s hype cycle, but it is not going to go away.

Carlos then went into some details about Cloud computing, the architecture, etc.

The implications of large data is that you will need to map the information to what you want to do, sort the information and reduce (summarize?) the contents to reduce the volume. He called that “MapReduce” which is a programming model.

What does that mean for decision management? We need to increase our capability to predict, which requires new analytical methods to cope with large, sparse, unstructured data. Use predictive models to capture correlations. Leverage cloud infrastructure.

More challenges:

  • To increase our ability to react and adapt
  • To deploy decision services in an efficient and scalable fashion within the cloud

#ORF09 Temporal Reasoning Presentation

This entry is part 23 of 30 in the series October Rules Fest 2009

Edson Tirelli and Adam Mollenkopf are presenting on CEP. Temporal reasoning is only one of the components of CEP.

CEP is about processing a large amount of events and identifying the meaningful events out of the event cloud. CEP uses techniques such as detection of complex patterns, etc.

Fedex Custom Critical (high value, special needs) needed to create dynamic schedules to support special needs of the customers. They get information from the vehicles, shipment information, aircraft events, traffic flow, traffic incidents, etc.

Very interesting demo showing how they use all of the information is combined and displayed with a GIS system and it allows tracking of trucks and shipments to make sure deliveries are done on time and within the requirements that the customer had asked for.

Adam then covered the architecture of the system and then elaborated on the expected benefits of doing all of this.

Edson then took over the presentation to explain how Drools Fusion supports CEP. Temporal reasoning requires:

  • A CEP enabled engine (time and events)
  • Ability to express temporal relationships
  • Requires a reference clock
  • Requires support of temporal dimension

They have implemented the 13 temporal operators that Allen (missed the full name) identified in a research paper. Basically: before, meets, overlaps, finishes, includes, starts, coincides, after, metBy, overlappedBy, etc.

The reference clock defines the evolution of time and is required to synchronize time sensitive operations. They allow for clocks to be defined specifically for the domain the rules are working for.

Edson then went through some of the details of how things work behind the scene (temporal distance algorithm) and showed example rules.

Very interesting stuff. I found it very interesting to see a very practical example of how CEP can be used and I am impressed with how evolved Drools has become in such a short time working in that space.

#ORF09 Practical and Modern RBE Presentation

This entry is part 22 of 30 in the series October Rules Fest 2009

Jason Morris is talking about a project he participates in using Jess and in partnership with the University of Sydney in Australia.

The project is SINFERS (Soil Inferencing System) which predicts new soil properties from existing properties by applying pedotranfer functions (PTF) rules.

Pedotransfer functions infer some new information based on some existing properties. One of the challenges is that they may have multiple functions that can calculate the same value. This is a conflict resolution problem. They also encountered challenges on calculating the error on each value and they faced a challenge of “inbreeding” which means that a function is using the result it calculates as an input).

He then started looking for places where you can perform automation for getting rid of some of the “boring” (quoted from Luke Voss’ talk) tasks and then worked on automating these tasks. For example, they can generate Jess rules based on the information in the PTF database.

He made a humorous side note where he explained how he learned to speak “Australian”. The point of that side note was to point out that we always need to learn the language of the domain we are writing rules for so that we have a common vocabulary.

Je then went into some of the details on the implementation approach he chose to do this.

For inspiration on problem resolution, Jason suggests that we go back through research that was done previously on problem resolution and algorithms so that you might get inspired on how to do things for your specific problem. Don’t reinvent the wheel.

As a design metaphor, he also used the “Incumbent and Challenger” metaphor (Champion-Challenger was also discussed yesterday). He then described his implementation of the rules to implement the champion-challenger pattern.

He gave us a quick demo of SINFERS and told us that the first part of the project has now been completed and that after some vacation time they are now only starting to discuss what would be next.

#ORF09 Programming Rules using a spreadsheet interface

This entry is part 21 of 30 in the series October Rules Fest 2009

Dr. Gopal Gupta and Abhilash Tiwari are talking about constraint programming in a spreadsheet interface. He calls constraints rules.

The motivation behind this is that non-experts should be able to program rules and to leverage existing technologies. The scope of the problems is constraints satisfaction problems (CSP).

The type of problem to resolve are usually NP-complete in complexity so they are very hard to solve. As an example, resource allocation is a typical problem they need to resolve.

Spreadsheets (rows and columns of data) allow some programming with macros or formulas. The calculations are applied by replications (a formula in a cell is copied to other cells). In current spreadsheets you can only have arithmetic expressions, so there is a need to extend the functionality to support constraints.

The tool (PlanEx) they developed uses .Net technology as an add-in to Excel. They offer formulas such as Distinct, Frequency, Present_Once, and If-then. You can then ask for the product to solve the problem and get a solution. If you don’t like the solution and there are multiple solutions, you can ask for another solution.

He gave some examples of problems such as preparing the schedule for university courses (courses offered by teachers in classrooms and possibly at specific times, etc.) or simply employee scheduling in a shift based company.

They then gave us a short demo of the tool to solve a sudoku puzzle as well as for the course scheduling problem.