Notes from Alibab's Qwen3.5 LLM create a summary, which seems to be a reasonable
representation of the chat content if you need a TL;DR
Here are my immediate comments & responses to the summary John Mark
Based on the provided meeting transcript, here is a structured extraction of key questions,
comments, and a topical summary regarding Bayesian Methods and Networks.
1. Topical Summary
The discussion centered on the theoretical foundations and practical implementation of
Bayesian Networks (BN) and Influence Diagrams. The conversation moved between three main
pillars:
Philosophical Foundations: Determining whether "influence" in network arcs implies
causality or merely relevance/informational connection, referencing Leonard Savage's work
versus modern causal inference standards (Judea Pearl). There was debate over the
exclusivity of Kolmogorov probabilities versus other uncertainty quantifications.
Savage's writing on personal probabilities has nothing to say about causality. Pearl's 2000
book "Causality" started the current discussion. The topic is deep and deserves a longer
discussion. As regards Kolmotorov - the conventional axiomatic definition of probability -
alternatives such as Dempster-Shafer or fuzzy methods don't add any value for what we are
doing.
Model Construction & Interpretation: Clarifying how arrows are drawn (relevance vs.
causality), the distinction between structure validation and conditional probability updating,
and the handling of continuous nodes versus discrete variables in popular software
packages (e.g., Genie, AgenaRisk).
Hmm, many things are rolled togeter here. My term is a "structural prior" meaning the
judgment from which the connections in the network are drawn. The structural prior has
more importance in model validity than the numerical priors, in part since the independence
claims made are strong assumptions, and add power to the model.
Utility & Optimization: Defining the role of utility functions (risk preferences vs. NPV) and the
computational challenges of fitting Bayesian models to real data, specifically regarding
multi-objective optimization for likelihood maximization across stochastic nodes.
This talk barely touched on utility functions, except to say they are straightforward to add to
the model.
2. Key Questions Raised on Bayesian Methods
The following technical questions were posed by participants regarding methodology and
implementation:Foundations & Probability Theory
Is the standard strictly Kolmogorov probability, or do we need to stop using alternative forms
of uncertainty quantification? (George Hazelrigg)
If by alternate forms you mean e.g. Dempster-Shafer, yes -- if they violate probability
axioms the whole rationality edifice collapses.
Is Bayes Theorem simply a matter of set theory, or is it an extension of logic requiring Cox's
Theorem and Edwin Jaynes' work? (Sheldon Bernard)
Yes Bayes rule is just algebra of measures over mutually exclusive events (the reason for
adopting BNs is to avoid exponential complexity with large numbers of variables.) Cox's
Theorem justifies using Bayes rule for information updating.
Do we have to "know" causality exists before incorporating it into the analysis, or can it be
reasonably inferred? (Jim Spanier)
In short the engineer / expert's explanation of how the system functions is by its nature
causal, in the (commonly used) sense I'm using. If there's a question whether some relation
is causal, then the science is incomplete and it is ammenable to statistical investigation. On
the other hand, since we are expressing causes by probabilities, (epistemic) uncertainty
about the cause can be expressed in the probabilities.
Network Semantics & Construction
In ML, are we looking at Decision Forests as well here? (Jamie Marzonie)
You're referring to using other ML techniques like Random Forests? Any ML method that
admits of a Bayesian interpretation can be used - linear regression included. I used CART
for its first class explainability, and for the direct mapping between the classification tree leaf
nodes and cells in the BN node's CPT.
Is "influence" a mix of correlation and causation, or could be either? (MarcyConn)
If by "influence" we just mean a euphemism for "cause" (my claim) then I think it is clearer
not to call a correlation an influence. But as for BN structure, there's nothing wrong with
building it from correlations, just as long as we don't label it causal.
Where is the boundary between considering correlation versus weak causality when
quantifying relationships? (George Hazelrigg)
"Weak causality?" I think the point is to relegate correlation only to a value computed over
data, and not ascribe it to belief -- for semantic reasons.
If I have to know causality prior to constructing my model, will many cases not be able to
even start drawing/using the network? (Reidar Bratvold)
I think this has to do with the depth of understanding of the experts one works with. Experts
with a deep understanding of the behavior of the system at hand naturally "know its causality" simply in their ability to explain how it works or how it fails. But I have worked withengineers who've given me for example network logs of observables and have no notion of why the system behaves as it does. "Expert systems" presume expertise.
When fitting Bayesian Network to real data... is the fitting process itself more like a multi-
objective optimization that is quite hard to balance? (zihan ren)
I didn't get into learning network structure also, since it can be problematic, since one needs
to solve an exponentially complex extension of an exponentially complex problem. It can be
useful, but my experience is that the resulting network requires careful review by experts.
Dale forwarded me question - what about using Bayesian regression to learn from data?
Yes - I chose CART just for convenience of explanation. The entire Bayesian ML field is
applicable and deserves investigation.
Utility & Decision Logic
Is the term Utility referring to risk preferences or things like NPV? (Brian Putt)
Yes, there's the distinction that utility is a function of value that expresses risk preference.
What does the data update? Is it used to validate structure or just inform the conditional
probabilities? (ferna02d)
The problem I posed is just to update the CPTs, which one would think is a straightforward,
obvious application of ML, granted how it can be made to fit with the BN. Learning structure
would raise questions about learning causality -- a harder problem.
Which is the decision node in the diagram notation? (Brian Putt)
Circles are probabilities, diamonds are values, squares are decisions.
Practical Implementation & Tools
BN advocates show examples using binary/discrete nodes... their practicality is limited by
clumsiness in handling continuous nodes... Is this a significant limitation for problems with
many continuous variables? (Keith Shepherd)
In short yes. I don't know of the various attempts to approximate continuous distributions
which fare better. This is the domain of Markov Chain Monte Carlo simulation methods -
SeeStan
Packages that do dynamic discretisation become desperately slow with more than a handful
of continuous variables. What is the industry standard workaround? (Keith Shepherd)
One approach incidentally is the CART classification tree algorithm I demonstrated that
chooses discretization over all continuous input variables based on prediction accuracy. Perhaps it can
be extended to optimize for decision EV.
3. Key Comments & Expert Insights
The following comments highlight specific insights or consensus reached during the discussion:On Relevance vs. Causality: Reidar Bratvold clarified Howard's perspective that arcs in a network represent relevance, not causality. He noted: "Including an arrow means that there
MAY be a relevance. Not including an arrow is an absolute statement that there is no
relevance between the events."
Agree that the absence of an arc is a strong condition however we call it.
On Savage's Work: The group referenced Leonard Savage's The Foundations of Statistics
and Von Neumann & Morgenstern's Theory of Games, establishing these as key texts for
understanding the "Bayesian promise" regarding inductive reasoning.
On Bayesian Probability Definition: Sheldon Bernard emphasized that "Bayesian
probability is... more accurately described as the extension of logic" rather than just set
theory, referencing Cox's Theorem.
On Causal Inference Resources: To address the difficulty of establishing causality vs.
correlation, Sheldon recommended Scott Cunningham's Causal Inference: The Mixtape and
Judea Pearl's work for understanding how modern modelers approach this.
Just to note, nothing I presented today touches on "establishing causality" - the domain of
statistics.
On Fitting & Optimization: While acknowledging that fitting data requires likelihood
maximization, the group noted that "we are trying to infer cause -- not deductive reasoning
but inductive reasoning based on plausibility" (Sheldon Bernard), accepting that we cannot
know "cause" 100%.
Since we cannot, using probability to reason with it makes sense..
On Decision Notation: MarcyConn confirmed that rectangular notation is used for decision
nodes, helping standardize how the model is read by domain experts.
On SME Elicitation: A distinction was made between Structure (elicitation) and
Probabilities (data update). The group questioned whether data validates the structure itself
or simply informs the Conditional Probabilities (CPs).
As a major take-away, the use of ML to use data to add rigor to influence diagrams -- finding
the balance between where judgment is best applied and where data can be applied in this
framework is key. The techniques are still in their infancy. _JM
------------------------------
John Mark Agosta
johnmark.agosta@fondata.ai-------
LinkedIn:
https://www.linkedin.com/in/john-mark-agosta/\
Find me at
https://medium.com/@johnmark54------------------------------