Research Publications

Journal Articles

Lim, B. G., Dayta, D., Tiu, B. R., Tan, R. R., Garces, L. P. D., & Ikeda, K. (2026). Dynamic Factor Analysis of Price Movements in the Philippine Stock Exchange. Financial Innovation, 12(4).

The intricate dynamics of stock markets have led to extensive research on models that are able to effectively explain their inherent complexities. This study leverages the econometrics literature to explore the dynamic factor model as an interpretable model with sufficient predictive capabilities for capturing essential market phenomena. Although the model has been extensively applied for predictive purposes, this study focuses on analyzing the extracted loadings and common factors as an alternative framework for understanding stock price dynamics. The results reveal novel insights into traditional market theories when applied to the Philippine Stock Exchange using the Kalman method and maximum likelihood estimation, with subsequent validation against the capital asset pricing model. Notably, a one-factor model extracts a common factor representing systematic or market dynamics similar to the composite index, whereas a two-factor model extracts common factors representing market trends and volatility. Furthermore, an application of the model for nowcasting the growth rates of the Philippine gross domestic product highlights the potential of the extracted common factors as viable real-time market indicators, yielding over a 34% decrease in the out-of-sample prediction error. Overall, the results underscore the value of dynamic factor analysis in gaining a deeper understanding of market price movement dynamics.

Yu, Z., Lim, B. G. S., Tan, R. R. P., Kaji, H., & Ikeda, K. (2026). A Graph-Based Heuristic for Bus Route Planning Using Taxi Origin-Destination Data. IEEE Access, 14, 13865–13880.

As the global population ages and transportation resources become increasingly constrained, mid-sized suburban aging cities face growing challenges in designing efficient and sustainable public transit systems. In response, this study introduces a comprehensive and scalable data-driven heuristic based on graph theory for bus stop selection and route planning, leveraging large-scale taxi origin-destination data. The proposed algorithm features a density-based spatial clustering module that respects pedestrian networks and physical distances to identify candidate bus stops corresponding to mobility hotspots. Subsequently, a K-medoids clustering component based on road distances further segments and assigns bus stops to bus routes while respecting road networks. Finally, a composite Hamiltonian path construction heuristic plans feasible bus routes for each segment. A case study of Susono city in Shizuoka, Japan, demonstrates the effectiveness of the proposed methodology in generating four practical, user-centric, and context-aware bus routes with high operational efficiency. Overall, the proposed algorithm provides a data-driven framework for effective and adaptive mobility planning suitable for both local authorities and bus operators.

Tiu, B. R., Dayta, D., Ong, H. J., Lim, B. G., & Ikeda, K. (2026). A Bayesian Monte Carlo Variational Inference Estimation Procedure for Dynamic Factor Models on Stock Price Returns. IEEE Access, 14, 33730–33743.

Dynamic factor models (DFMs) provide a framework for distilling high-dimensional time series data into a small set of unobserved latent factors. Traditional statistical methods for estimating DFMs are often computationally intensive and can be inflexible when adapting to non-linear model extensions. In this work, we propose a modular Bayesian Monte Carlo variational inference (MCVI) estimation procedure for DFMs designed to prioritize flexibility and extensibility. We demonstrate that in the context of the Philippine Stock Exchange, the estimated latent factors and loadings qualitatively align with findings obtained through conventional methods. To illustrate the modularity of the framework, we extend the DFM to incorporate a diagonal BEKK-GARCH structure, which reveals non-trivial volatility clustering in the latent factors. While the adoption of a mean-field variational approximation entails certain trade-offs in posterior accuracy and computational overhead relative to specialized linear estimators, the decoupling of the model specification from the inference engine allows for the rapid integration of complex layers without requiring model-specific re-derivations. Overall, this work establishes an extensible Bayesian framework that facilitates more nuanced investigations into the dynamic and heteroskedastic nature of systematic risk in financial markets.

Lim, B. G., Lim, G. B., Tan, R. R., King, I., & Ikeda, K. (2026). Enhancing Graph Representations with Neighborhood-Contextualized Message-Passing. Transactions on Machine Learning Research.

Graph neural networks (GNNs) have become an indispensable tool for analyzing relational data. Classical GNNs are broadly classified into three variants: convolutional, attentional, and message-passing. While the standard message-passing variant is expressive, its typical pair-wise messages only consider the features of the center node and each neighboring node individually. This design fails to incorporate contextual information contained within the broader local neighborhood, potentially hindering its ability to learn meaningful relationships within the entire set of neighboring nodes—a critical limitation for complex domains like financial network anomaly detection and molecular property prediction. To address this, the paper first refines the concept of neighborhood-contextualization within GNNs, leveraging ideas from set-based aggregation methods and a key property of the attentional variant. This then serves as the basis for generalizing the message-passing variant to the proposed neighborhood-contextualized message-passing (NCMP) framework. To demonstrate its utility, a simple, mathematically grounded method to parametrize and operationalize NCMP is presented, leading to the development of the proposed Soft-Isomorphic Neighborhood-Contextualized Graph Convolution Network (SINC-GCN). Across a diverse set of synthetic and benchmark datasets, SINC-GCN strikes a highly favorable balance between expressivity and efficiency. Notably, while more complex models incur significant computational overhead, SINC-GCN delivers substantial performance gains with considerable effect sizes over baseline GNN models while maintaining a highly efficient asymptotic runtime complexity, further underscoring the distinctive utility of neighborhood-contextualization. Overall, by integrating multiset neighborhood context, the proposed NCMP framework serves as a practical and scalable path toward enhancing the graph representational power of classical GNNs.

Ong, H. J. J., Lim, B. G. S., Tan, R. R. P., & Ikeda, K. (2026). Measure Over Search: A Critical Re-evaluation of the Roles of Search and Independence Measure in LiNGAM-based Causal Discovery. IEICE Transactions on Information and Systems.

Causal discovery aims to infer cause-and-effect relationships from observational data, a crucial step beyond statistical correlation. A prominent method for this is the Linear Non-Gaussian Acyclic Model (LiNGAM), which can uniquely identify the causal structure by assuming linear relationships and non-Gaussian noise. LiNGAM-based algorithms typically depend on two key components: a search algorithm to determine the causal ordering of variables, and an independence measure to guide the search. Recent work, LiNGAM-MMI, proposed that replacing the simple greedy search with a global, shortest-path search led to superior performance, particularly when unmeasured common causes (confounders) are present. However, the claim was based on experiments that also modified the independence measure from the original baseline, making it difficult to isolate the source of the improvement. To address this, we perform extensive experiments to test whether the search algorithm was truly the driver of performance, hypothesizing that the choice of independence measure is the dominant factor. In particular, we introduce a unified beam search framework that serves as both an analytical tool to disentangle these components and a practical algorithm with a scalable performance-complexity trade-off. Our simulations comparing the kNN-based Copula Entropy (CopEnt) with the Pairwise Likelihood Ratio (PLR) establish that the independence measure is the dominant factor, to the extent that a simple greedy search with a more effective measure, PLR, outperforms a global search with a less effective one, CopEnt, on general graphs. Furthermore, we find no evidence that the benefit of a more complex search algorithm is specific to handling unmeasured confounders, suggesting it instead serves to overcome general estimation errors arising from finite data. Finally, we demonstrate that the strong performance with CopEnt reported in the previous work was an artifact of a simplistic experimental setup, as its performance advantage is reversed on more realistic and complex structures, including Erdős-Rényi (ER) and Scale-Free (SF) networks.

Lim, B. G., Tan, R. R., de Jesus, R., Garciano, L. E., Garciano, A., & Ikeda, K. (2025). Path survival reliabilities as measures of reliability for lifeline utility networks. Journal of Combinatorial Optimization, 49(57).

Lifeline utility networks have been studied extensively within the domain of network reliability due to the prevalence of natural hazards. The reliability of these networks is typically investigated through graphs that retain their structural characteristics. This paper introduces novel connectivity-based reliability measures tailored for stochastic graphs with designated source vertices and failure-probability-weighted edges. In particular, the per-vertex path survival reliability quantifies the average survival likelihood of single-source paths from a vertex to any source. A consolidated per-graph reliability measure is also presented, incorporating graph density and the shortest distance to a source as regulating elements for network comparison. To highlight the advantages of the proposed reliability measures, a theoretical discussion of their key properties is presented, along with a comparison against standard reliability measurements. The proposal is further accompanied by an efficient calculation procedure utilizing the zero-suppressed binary decision diagram, constructed through the frontier-based search, to compactly represent all single-source paths. Finally, the path survival reliabilities are calculated for a set of real-world networks and demonstrated to provide practical insights.

Lim, B. G., Lim, G. B. S., Tan, R. R., & Ikeda, K. (2025). Contextualized Messages Boost Graph Representations. Transactions on Machine Learning Research. (Reproducibility Certification).

Graph neural networks (GNNs) have gained significant attention in recent years for their ability to process data that may be represented as graphs. This has prompted several studies to explore their representational capability based on the graph isomorphism task. Notably, these works inherently assume a countable node feature representation, potentially limiting their applicability. Interestingly, only a few study GNNs with uncountable node feature representation. In the paper, a new perspective on the representational capability of GNNs is investigated across all levels—node-level, neighborhood-level, and graph-level—when the space of node feature representation is uncountable. Specifically, the injective and metric requirements of previous works are softly relaxed by employing a pseudometric distance on the space of input to create a soft-injective function such that distinct inputs may produce similar outputs if and only if the pseudometric deems the inputs to be sufficiently similar on some representation. As a consequence, a simple and computationally efficient soft-isomorphic relational graph convolution network (SIR-GCN) that emphasizes the contextualized transformation of neighborhood feature representations via anisotropic and dynamic message functions is proposed. Furthermore, a mathematical discussion on the relationship between SIR-GCN and key GNNs in literature is laid out to put the contribution into context, establishing SIR-GCN as a generalization of classical GNN methodologies. To close, experiments on synthetic and benchmark datasets demonstrate the relative superiority of SIR-GCN, outperforming comparable models in node and graph property prediction tasks.

Lim, B., Saavedra, M., Tan, R., Ikeda, K., & Yu, W. (2025). Geo-distributed multi-cloud data centre storage tiering and selection with zero-suppressed binary decision diagrams. International Journal of Cloud Computing, 14(2), 163–182.

The exponential growth of data in recent years prompted cloud providers to introduce diverse geo-distributed storage solutions for various needs. The vast amount of storage options, however, presents organisations with a challenge in determining the ideal data placement configuration. The study introduces a novel optimisation algorithm utilising the zero-suppressed binary decision diagram to select the optimal data centre, storage tiers, and cloud provider. The algorithm takes on a holistic approach that considers cost, latency, and high availability, applicable to both geo-distributed on-premise environments and public cloud providers. Furthermore, the proposed methodology leverages the recursive structure of the zero-suppressed binary decision diagram, allowing for the enumeration and ranking of all valid configurations based on total cost. Overall, the study offers flexibility for organisations in addressing specific priorities for cloud storage solutions by providing alternative near-optimal configurations.

Lim, G. B. S., Lim, B. G. S., Bandala, A. A., Jose, J. A. C., Chu, T. S. C., & Sybingco, E. (2025). AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification. IEEE Access, 13, 187383–187409.

Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subjectindependent classifications, respectively, with further improvements to 72.13% and 90.54% for subjectspecific classifications. As guidance for future research, this study formally established practical model training-evaluation frameworks for subject-independent and subject-specific EEG classifications.

Ong, H. J. J., Lim, B. G. S., Tan, R. R. P., & Ikeda, K. (2025). Causal discovery in Additive Noise Models using beam search. Artificial Life and Robotics.

Causal discovery from observational data is a fundamental challenge. Greedy search algorithms like Regression with Subsequent Independence Test (RESIT), commonly used for learning Additive Noise Models (ANMs), are susceptible to making irreversible errors, especially in high-variance contexts. Such settings can be caused by unmeasured confounders or by high statistical noise from finite samples. To address this, we introduce a novel generalization of RESIT that replaces its local, greedy search with a more robust beam search, framing the task as a path search on a state-space graph. Through extensive simulation experiments, we demonstrate that structural accuracy, measured by Structural Hamming Distance (SHD) and Structural Intervention Distance (SID), consistently improves as the beam width (w) increases. Crucially, we also show that this performance gain comes at a manageable, approximately linear increase in computational cost relative to w. Furthermore, our analysis across different sample sizes shows these gains are most statistically significant in intermediate regimes (n = 250, 500). This suggests that at these sample sizes, the statistical noise is high enough to mislead the greedy search into a suboptimal ordering, an error our wider beam search corrects, while performance converges at large sample sizes (n = 1000). Our framework provides a practical, tunable algorithm that bridges the gap between fast but brittle local search methods and computationally infeasible global searches, thereby enhancing the reliability of causal discovery in complex, high-variance settings where such local errors are common.

Lim, B. G., Tan, R. R., Kawahara, J., Minato, S.-I., & Ikeda, K. (2024). A Recursive Framework for Evaluating Moments Using Zero-Suppressed Binary Decision Diagrams. IEEE Access, 12, 91886–91895.

The zero-suppressed binary decision diagram (ZDD) is a compact data structure widely used for the efficient representation of families of sparse subsets. Its inherent recursive structure also facilitates easy diagram manipulation and family operations. Practical applications generally fall under discrete optimization, such as combinatorial problems and graph theory. Given its utility, summarizing the subsets represented in the diagram using key metrics is of great value as this provides valuable insights into the characteristics of the family. The paper proposes a recursive algorithm to extract information on moments from families represented as a ZDD. Given a value for every element in the universe, the value of a subset is first formulated as the sum of the values of its elements. The moments of a family are then calculated as the mean of the exponentiated subset values, akin to the method of moments. Leveraging the structure of the ZDD, the proposed algorithm recursively traverses a given diagram for efficient moments evaluation via multinomial expansion. Its utility is then demonstrated with three classical problems—power sets, the knapsack problem, and paths in graphs—offering orders of magnitude increase in computational efficiency relative to conventional method. Overall, the proposed algorithm enhances the functionality of the ZDD by introducing an efficient family operation to uncover the distribution of subset values in a represented family.

Yu, Z., Guinto, M. C. S. G., Lim, B. G. S., Tan, R. R. P., Yoshimoto, J., Ikeda, K., Ohta, Y., & Ohta, J. (2023). Engineering a data processing pipeline for an ultra-lightweight lensless fluorescence imaging device with neuronal-cluster resolution. Artificial Life and Robotics, 28, 483–495.

In working toward the goal of uncovering the inner workings of the brain, various imaging techniques have been the subject of research. Among the prominent technologies are devices that are based on the ability of transgenic animals to signal neuronal activity through fluorescent indicators. This paper investigates the utility of an original ultra-lightweight needle-type device in fluorescence neuroimaging. A generalizable data processing pipeline is proposed to compensate for the reduced image resolution of the lensless device. In particular, a modular solution centered on baseline-induced noise reduction and principal component analysis is designed as a stand-in for physical lenses in the aggregation and quasi-reconstruction of neuronal activity. Data-driven evidence backing the identification of regions of interest is then demonstrated, establishing the relative superiority of the method over neuroscience conventions within comparable contexts.

Conference Proceedings

Lim, B. G., Liu, J., Ong, H. J., Chan, J. A., Tan, R. R., King, I., & Ikeda, K. (2025). FinSIR: Financial SIR-GCN for Market-Aware Stock Recommendation. 2025 International Joint Conference on Neural Networks (IJCNN), 1–8.

Existing works on stock price prediction have largely treated stocks in a market independently of one another. Nevertheless, recent advances in graph neural networks (GNNs) have enabled the efficient processing of diverse stock relations. This paper introduces the Financial SIR-GCN (FinSIR) for market-aware stock price prediction and recommendation. By modeling stock markets as spatio-temporal graphs, FinSIR addresses the key architectural limitation of existing graph-based models. Notably, the proposed model integrates the soft-isomorphic relational graph convolution network (SIR-GCN) with the "sandwich" structure employed in GNN for time series analysis (GNN4TS) to jointly process the two key dimensions of stock market graphs and to contextualize hidden states with both spatial and temporal stock relations. Backtesting results on the New York Stock Exchange (NYSE) and the National Association of Securities Dealers Automatic Quotation System (NASDAQ) reveal FinSIR consistently achieving up to 65% and 36% larger cumulative investment returns, respectively, compared to baseline models. Additionally, an ablation study further highlights the contribution of each FinSIR module in providing better investment recommendations. Overall, the paper incorporates recent advances in GNN and GNN4TS to provide a new perspective on graph-based solutions for improved stock price prediction and recommendation.

Ong, H. J. J., Lim, B. G. S., Tiu, B. R. C., Tan, R. R. P., & Ikeda, K. (2025). A Compression-Based Dependence Measure for Causal Discovery by Additive Noise Models. Neural Information Processing, 61–75.

In this work, we introduce a novel compression-based dependence measure (CDM) for causal discovery. Our proposed measure leverages data compression to quantify dependence, offering a new approach that is effective even with small data sizes. Through extensive simulations with general additive noise models, causal additive models, and linear non-Gaussian acyclic models, we demonstrate the relative superiority of CDM over existing methods. Additionally, we validate our approach using the cause-effect pairs benchmark dataset, where CDM shows comparable accuracy across various sample sizes. To close, we discuss the sensitivity of CDM to data scales, an issue shared by other causal discovery methods. Despite this, CDM presents a promising way to take advantage of data compression for causal discovery.

Lim, B. G., Ong, H. J., Tan, R. R., & Ikeda, K. (2024). Dynamic Principal Component Analysis for the Construction of High-Frequency Economic Indicators. Proceedings of the 4th International Conference on Advances in Computational Science and Engineering, 645–663.

Recent progress in data analysis and machine learning has enabled the efficient processing of large data; however, the public sector has yet to fully adopt these advancements. The study investigates the application of dynamic principal component analysis in offering real-time insights into various facets of an economy, potentially aiding in the informed decision-making of policymakers. In brief, dynamic principal component analysis generates dynamic principal components representing latent factors that account for the autocovariance in time series data. In examining daily data from the Philippine stock exchange, Philippine peso exchange rates, and Philippine peso to United States dollar forward rates, results demonstrate the effectiveness of the first three dynamic principal components as high-frequency indicators for business and investment conditions, economic performance, and economic outlook, respectively. Moreover, an application of the isolation forest anomaly detection algorithm validates the sensitivity of the constructed indicators to systematic economic shocks, which identified events such as the taper tantrum of 2013 and the 2020 lockdown due to the novel coronavirus pandemic, among others. Overall, the practical applicability of the proposed methodology suggests potential extensions incorporating nontraditional data sources for more comprehensive economic indicators.

Tan, R. R. P., Asuncion, A. E. C., Lim, B. G. S., Soos, M., & Ikeda, K. (2023). The Pancake Graph of Order 10 Is 4-Colorable. Proceedings of the 2023 6th International Conference on Mathematics and Statistics, 1–6.

The pancake graph has served as a model for real-world networks due to its unique recursive and symmetric properties. Defined as the Cayley graph on the symmetric group of order n generated by prefix reversals, the n-pancake graph exhibits a rapid increase in the number of vertices and edges with respect to order n. While there are considerable graph-theoretic results on the graph, findings on chromatic properties for larger n are limited. In this paper, the 10-pancake graph is established to be 4-colorable through an efficient Boolean-satisfiability-based stochastic local search framework for vertex coloring. Building on the aforementioned, a new linear bound for the chromatic number of the pancake graph is put forward. In addition, the range of possible bounds that may be obtained from the same technique is determined.

Preprints & Submitted Works

Ong, H. J. J., Lim, B. G. S., Dayta, D., Tan, R. R. P., & Ikeda, K. (2025). Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders. (Submitted to IEEE Access).

Unsupervised representation learning seeks to recover latent generative factors, yet standard methods relying on statistical independence often fail to capture causal dependencies. A central challenge is identifiability: as established in disentangled representation learning and nonlinear ICA literature, disentangling causal variables from observational data is impossible without supervision, auxiliary signals, or strong inductive biases. In this work, we propose the Latent Additive Noise Model Causal Autoencoder (LANCA) to operationalize the Additive Noise Model (ANM) as a strong inductive bias for unsupervised discovery. Theoretically, we prove that while the ANM constraint does not guarantee unique identifiability in the general mixing case, it resolves component-wise indeterminacy by restricting the admissible transformations from arbitrary diffeomorphisms to the affine class. Methodologically, arguing that the stochastic encoding inherent to VAEs obscures the structural residuals required for latent causal discovery, LANCA employs a deterministic Wasserstein Auto-Encoder (WAE) coupled with a differentiable ANM Layer. This architecture transforms residual independence from a passive assumption into an explicit optimization objective. Empirically, LANCA outperforms state-of-the-art baselines on synthetic physics benchmarks (Pendulum, Flow), and on photorealistic environments (CANDLE), where it demonstrates superior robustness to spurious correlations arising from complex background scenes.

Ong, H. J. J., Lim, B. G. S., Tan, R. R. P., & Ikeda, K. (2024). Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration. arXiv:2404.11922.

Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework.
First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery.

Research Publications

Journal Articles

Conference Proceedings

Preprints & Submitted Works

Brian Godwin Lim

Error

Journal Articles

Conference Proceedings

Preprints & Submitted Works

Templates (for web app):

Error