Projects | Rani Basna

Data Driven Knot Selection

Sun, 27 Jun 2021 00:00:00 +0000

In implementations of the functional data methods, the effect of the initial choice of an orthonormal basis has not gained much attention in the past. Typically, several standard bases such as Fourier, wavelets, splines, etc. are considered to transform observed functional data and a choice is made without any formal criteria indicating which of the bases is preferable for the initial transformation of the data into functions. In an attempt to address this issue, we propose a strictly data-driven method of orthogonal basis selection. The method uses recently introduced orthogonal spline bases called the splinets obtained by efficient orthogonalization of the B-splines. The algorithm learns from the data in the machine learning style to efficiently place knots. The optimality criterion is based on the average (per functional data point) mean square error and is utilized both in the learning algorithms and in comparison studies. The latter indicates efficiency that is particularly evident for the sparse functional data and to a lesser degree in analyses of responses to complex physical systems.

Unsupervised Feature Transformation

Sun, 27 Jun 2021 00:00:00 +0000

This package intends to convert categorical features into numerical ones. This will help in employing algorithms and methods that only accept numerical data as input. The main motivation for writing this package is to use it in clustering assignments.

Probabilistic Graphical Models for Inference

Tue, 27 Apr 2021 00:00:00 +0000

Due to the typical highly heterogeneous clinical and epidemiological data generating processes, multiple correlations/dependencies between variables and between response variables arise. Conventional regression models have a limited capacity in capturing such dependent multi-factorial relationships. In this project, we plan to leverage the powerful approach of probabilistic graphical models (PGMs) such as BNs and Hidden Markov Models for conducting statistical inference. PGMs Models learn joint multivariate distributions over large numbers of random variables that interact with each other. So far, we have used BN. Bayesian network is a PGM that models the associations between all covariates with all variables being potentially dependent. Bayes rule is usually employed, which allows the factorization of the joint probability distribution of the involved variables, leading to a probabilistic graph directed acyclic graph (DAG). The overarching goal of these models is to estimate the conditional dependence of different variables, interactions, and therefore under specific assumptions may reveal potential causal links between variables. This flexibility gives BNs an advantage over other approaches for estimating the risk of disease with clinical relevance and allows core decision pathways.

Shiny app presenting clustering results

Mon, 27 Apr 2020 00:00:00 +0000

Clustering is central to many data-driven application domains and it has been investigated intensively in terms of algorithms and distance functions. Some of the most used traditional clustering methods are k-means and Hierarchical clustering. Both methods suffer from many issues when dealing with highly hetroginuoius data types. A typical challenge in clustering heterogeneous data lies in handling both numerical and categorical variables at the same time. Many commonly used approaches are limited in reflecting the correct distance that is needed to perform a good clustering. Another challenge in these classical algorithms is the definition of the distance matrix. Finding the right way to measure the distance between objects has a huge influence on the clustering performance. Both k-means and Hierarchal clustering constitute a vast majority of the unsupervised statistical methods that are used in the phenotyping within the airway disease literature. When data is large enough, new deep unsupervised and semi-supervised learning methods outperform traditional clustering methods in many ways. We plan to utilize the huge steps and findings in the area of deep unsupervised learning in our various phenotyping projects.

STELLAR

Wed, 01 May 2019 00:00:00 +0000

Phenotyping

Sat, 27 Apr 2019 00:00:00 +0000

Mean Field Games

Thu, 01 Sep 2016 00:00:00 +0000

The mean-field game theory is the study of strategic decision making in very large populations of weakly interacting individuals. Mean-field games have been an active area of research in the last decade due to its increased significance in many scientific fields. The foundations of mean-field theory go back to the theory of statistical and quantum physics. One may describe mean-field games as a type of stochastic differential game for which the interaction between the players is of mean-field type, i.e the players are coupled via their empirical measure. It was proposed by Larsy and Lions and independently by Huang, Malhame, and Caines. Since then, the mean-field games have become a rapidly growing area of research and has been studied by many researchers. However, most of these studies were dedicated to diffusion-type games. The main purpose of this project is to extend the theory of mean-field games to jump case in both discrete and continuous state space. Jump processes are a very important tool in many areas of applications. Specifically, when modeling abrupt events appearing in real life. For instance, financial modeling (option pricing and risk management), networks (electricity and Banks) and statistics (for modeling and analyzing spatial data). The project consists of two papers and one technical report:

An Epsilon Nash Equilibrium For Non-Linear Markov Games of Mean-Field-Type on Finite Spaces.
An approximate Nash equilibrium for pure jump Markov games of mean-field-type on continuous state space.