Empirical likelihood function with continuous parameters

Introduction

Currently I am working with a new type of particle detectors, still in development, called AC-LGADs (also RSD). These detectors promise to provide both spacial and temporal measurements for particles hits. Today I am not interested in the details of these detectors or how to use them, but in obtaining a method to infer the hit position of the particle (and also the time in future work) by maximizing the likelihood of the observation.

My current setup is composed by an AC-LGAD detector mounted in a scanning TCT, like this one, and can be depicted like this:

The detector (light gray square) has 4 readout pads to each of which I connect an oscilloscope. The detector is mounted such that a pulsed laser can be shined on an arbitrary \( x_0,y_0 \) position. In this way, I shine the laser at some point and register the signals coming out from the detector.

From these signals I can get different quantities such as the amplitude, the collected charge, etc. So, for example, from pad 1 at some position I have this (you can zoom horizontally to better appreciate the details):

and something similar for each of the other 3 pads.

Repeating this for many \( x,y \) positions I can see how each of these quantities varies as a function of the impact point. So for example the amplitude of the signal seen in pad 1 (channel 1 in the oscilloscope) looks like this:

As can be seen, the amplitude is highly dependent with the impact position. There are other quantities such as the collected charge and the time over threshold that also have an interesting dependence on \(x\) and \(y\).

Since the electron-hole pairs produced by a particle in a silicon detector is an inherently stochastic process, all the signals (and so the amplitude, collected charge, etc) are going to have a random nature with some distribution. So now the question is: How can I use this information to reconstruct the impact position of a particle in this detector?

Likelihood maximization

A very powerful way to estimate quantities when there are random fluctuations involved is to maximize the likelihood function. This is called the maximum likelihood estimator and it is extremely general and also has many nice properties.

So, suppose I measure some quantities \(c_i \in \mathcal{C}\) where \(i \in \{ 1,2,3,4 \} \) denotes the pad number and \( \mathcal{C}\) is a set of these quantities that I consider have a nice dependence with the impact point, for example amplitude and/or collected charge. If I have a single observation of each of the quantities \( c_i \), which will I say it was the \( x \) and the \( y \) coordinates of the particle? Here is where the likelihood function comes into play. It is defined as

\[ \mathscr{L}(c_i | x, y) := \prod_{C_i \in \mathcal{C}} f_{C_i} (c_i | x, y) \]

with \( f_{C_i}(c_i | x,y) \) the probability density function for the quantity \( C_i \) for which I have measured the value \( c_i \) at point \( (x,y) \). Note that \( x \) and \( y \) are the free variables of this function, because I don’t know the impact point of the particle. I only know the values of each of the \( c_i \)’s.

According to the principle of maximum likelihood my best choice for \( x \) and \( y \) would be that that maximizes \( \mathscr{L} \). Okay, let’s do that!

The problem with this is that I don’t know the (analytic expression for the) functions \( f_{C_i}(c_i | x,y) \). To make it worse, \( x \) and \( y \) are continuous parameters (if they were discrete I could try to measure \( f_{C_i}(c_i | x,y) \) at each possible \( x \) and \( y \)). So now the problem is how to determine the distribution function for each \( C_i \) and its dependence with the impact position.

Construction of an experimental likelihood function

Let’s see here how to determine experimentally a likelihood function to later using it to determine the impact position of a particle.

Measuring a distribution function at a single point

The first step towards the construction of an experimental likelihood function is to be able to determine a single distribution function \( f_{C_i}(c_i | x_0,y_0) \) at some fixed point \(x_0,y_0\). It is possible to do this using the gaussian_kde function from the SciPy module. For this the first step is to have samples of the distribution we want to “measure” and then it is as simple as giving these samples to the gaussian_kde function to obtain an approximation of the probability density function.

Example

Below there is a MWE that uses gaussian_kde to approximate an arbitrary distribution that was sampled. The samples are artificially generated inside the computer, but in “real life” they would be the result of a physical measurement. The code:

import numpy as np
from scipy.stats.kde import gaussian_kde
import myplotlib as mpl # https://github.com/SengerM/myplotlib

def generate_samples(x,y):
	return list(np.random.randn(99999)*(1+x)+y+x) + list((np.random.rand(9999)-.5)*3*(1+x)+y+x)

samples = generate_samples(5,3)
pdf = gaussian_kde(samples)

fig = mpl.manager.new(
	title = 'Example of gaussian_kde',
	xlabel = 'Random variable',
	ylabel = 'Probability density',
)
fig.hist(
	samples,
	density = True,
	label = 'Measured samples',
)
q = np.linspace(min(samples),max(samples))
fig.plot(
	q,
	pdf(q),
	label = 'Gaussian KDE approximation',
)
fig.show()

and the output plot looks like this:

As can be seen we have an approximation of the distribution. Of course this approximation is not perfect and will be different for each realization of the samples (the measured values). But it is better than nothing. And actually looks really good.

Now we can repeat this process for many different \(x,y\) points. In the end we will have the distribution \( f_{C_i}(c_i | x_0,y_0) \) sampled at many different \(x_0,y_0\).

From discrete \(x,y\) to continuous \(x,y\)

Once the distribution \( f_{C_i}(c_i | x,y) \) was sampled at many \( x,y \) points the next step is to somehow interpolate for any intermediate \( x,y \). This is not trivial, or at least I have not found any package that solves this in a single line of code. In fact, as we will see, this has not a unique answer and depends on the criterion we want to use.

For this we first need to define some interpolation method, and this requires “human input” in order to decide how to do it. Consider for example the following distributions:

These can be two measured distributions at two different \( x \) values (let’s ignore \( y \) for simplicity). Now let’s think of how the distribution should look like for an intermediate \( x \) :

As can be seen there are many different options, and depending on what we want to do any can be the correct answer. In my current case, however, the green one seems to be the correct answer. I.e. the distribution should somehow move and transform smoothly from one to another instead of “disappearing here and appearing there” as in the violet drawing. Even after deciding this, there are probably many different ways (which yield slightly different results) of implementing this. Below I will share my procedure.

To perform the interpolation between the different discrete \( x,y\) points I proceeded in two steps:

  1. Interpolate the mean values \( \mu_0 \) and \( \mu_1 \) to obtain a new \( \mu \). In 1D this is very easy: Just interpolate using a straight line between the two points. In 2D (as in my case) it is not as trivial as using a plane, but fortunately the Python community has already solved this problem. Of course there is more than one approach here, but let’s keep it simple and use an easy and reasonable solution.
  2. Interpolate the profile of the density function. By this I mean the shape of the function itself. Again, here there are many different options on how to do this. I will discuss it late onr.

The following drawings should help in visualizing this process. In the first drawing I show a straight-line-interpolation for the mean value \( \mu \) of the interpolated distribution:

In the second drawing (below) I show one possibility for interpolating the profile of the distribution. As can be seen the two distributions were shifted such that they share the same mean value and in this condition a weighted average is performed. Details will come later.

These drawings show the case for a unique parameter \( x \). For a two dimensional parameters space \( x,y \) it is the same idea. First I performed the interpolation of the mean value and then the interpolation of the function profile. This time, however, there are 4 distributions to “mix” in order to obtain the interpolated distribution (I assumed a rectangular grid of sampled points). For the interpolation of the mean value I used the scipy.interpolate.interp2d function which performs a interpolation using splines. So:

\[ \mu=\text{scipy.interpolate.interp2d}\left(\left[\begin{matrix}x_{0} & x_{0} & x_{1} & x_{1}\end{matrix}\right],\left[\begin{matrix}y_{0} & y_{1} & y_{0} & y_{1}\end{matrix}\right],\left[\begin{matrix}\mu_{00} & \mu_{01} & \mu_{10} & \mu_{11}\end{matrix}\right]\right) (x,y) \]

where each of the sub indices denotes which of the four points is being considered:

For the interpolation of the function profile I proceeded manually. First I shifted the functions such that their mean value was \( \mu \) and then I performed an average weighted by the area of the “opposite rectangle”:

\[ f\left(q|x,y\right)=\sum_{i\in\left\{ 0,1\right\} }\sum_{j\in\left\{ 0,1\right\} }f_{ij}\left(q+\mu_{ij}-\mu\right)\frac{\left|x-x_{1-i}\right|\left|y-y_{1-j}\right|}{\left(x_{1}-x_{0}\right)\left(y_{1}-y_{0}\right)} \]

where \(f_{ij} \) is the measured distribution at point \( x_i,y_j \) and \( q \equiv c_i \) is just a relabeling.

Example

The results of applying these formulae look pretty good, here is an example:

Finally, the empirical likelihood

Combining all the previous stuff I am finally able to construct the “empirical likelihood” in order to use it for parameter estimation. From here on it is just a problem about how to implement this in a computer. Fortunately, although not trivial, this is not so hard with to do in Python. The code below simulates a fake data set that represents some measured quantity at different \( x,y \) positions, then it builds approximates the distribution at each point using KDE estimation, after this it produces an interpolation to get continuous parameters, and finally it builds the likelihood function and maximizes it:

import numpy as np
from scipy.stats.kde import gaussian_kde
from scipy import interpolate

def check_number_or_numpy_array(var, name: str):
	if not (isinstance(var,int) or isinstance(var,float) or isinstance(var, np.ndarray)):
		raise TypeError(f'<{name}> must be a number (int or float) or a numpy array, received instead <{name}> of type {type(var)}.')

class ExperimentalParametricDensity:
	def __init__(self, samples: dict):
		# <samples> is a dict that contains the data in each xy sampled point, for example samples[x][y] = samples_array_at_x_y.
		# Assuming that the x,y points are distributed in a rectangular mesh, i.e. it is just a matrix.
		self._samples = samples
		self._x_vals = sorted([x for x in samples])
		self._y_vals = sorted([y for y in samples[self._x_vals[0]]]) # Here I am using that the x,y points are distributed in a rectangular mesh.
		
	@property
	def means(self):
		if not hasattr(self, '_means'):
			self._means = np.zeros((len(self._y_vals), len(self._x_vals)))
			self._means[:] = float('NaN')
			for nx,x in enumerate(self._x_vals):
				for ny,y in enumerate(self._y_vals):
					self._means[ny,nx] = np.nanmean(self._samples[x][y])
		return self._means
	
	@property
	def xx(self):
		if not hasattr(self, '_xx'):
			xx,yy = np.meshgrid(self._x_vals, self._y_vals)
			self._xx = xx
			self._yy = yy
		return self._xx
	
	@property
	def yy(self):
		if not hasattr(self, '_yy'):
			self.xx
		return self._yy
	
	@property
	def kde(self):
		# Returns a dictionary with the KDE at each sampled x,y. This means that returns the same as self._samples but instead of having samples it has the KDE functions. Each KDE function has the signature KDE(q).
		if not hasattr(self, '_kde'):
			self._kde = {}
			for x in self._x_vals:
				self._kde[x] = {}
				for y in self._y_vals:
					self._kde[x][y] = gaussian_kde(self._samples[x][y])
		return self._kde
	
	def __call__(self, q, x, y):
		if not (isinstance(x,int) or isinstance(x,float)):
			raise TypeError(f'<x> must be a number (int or float), received instead <x> of type {type(x)}.')
		if not (isinstance(y,int) or isinstance(y,float)):
			raise TypeError(f'<y> must be a number (int or float), received instead <y> of type {type(y)}.')
		if not min(self._x_vals) <= x <= max(self._x_vals):
			raise ValueError(f'<x> must be bounded within the sampling region. The minimum value for <x> is {min(self._x_vals)} and the maximum {max(self._x_vals)}, received x = {x}.')
		if not min(self._y_vals) <= y <= max(self._y_vals):
			raise ValueError(f'<y> must be bounded within the sampling region. The minimum value for <y> is {min(self._y_vals)} and the maximum {max(self._y_vals)}, received y = {y}.')
		check_number_or_numpy_array(q, 'q')
		# If we are here is because x,y are both numbers and within the sampling range.
		if x == self._x_vals[0]: idx_prev_x = 0
		else: idx_prev_x = np.where((np.array(self._x_vals)<x))[0][-1]
		if y == self._y_vals[0]: idx_prev_y = 0
		else: idx_prev_y = np.where((np.array(self._y_vals)<y))[0][-1]
		x0 = self._x_vals[idx_prev_x]
		y0 = self._y_vals[idx_prev_y]
		x1 = self._x_vals[idx_prev_x+1]
		y1 = self._y_vals[idx_prev_y+1]
		µ00 = self.means[idx_prev_y,idx_prev_x]
		µ01 = self.means[idx_prev_y+1,idx_prev_x]
		µ10 = self.means[idx_prev_y,idx_prev_x+1]
		µ11 = self.means[idx_prev_y+1,idx_prev_x+1]
		f00 = self.kde[x0][y0]
		f01 = self.kde[x0][y1]
		f10 = self.kde[x1][y0]
		f11 = self.kde[x1][y1]
		µ = interpolate.interp2d(x=[x0,x0,x1,x1], y=[y0,y1,y0,y1], z=[µ00,µ01,µ10,µ11])(x,y)
		return (f00(q+µ00-µ)*(x1-x)*(y1-y) + f01(q+µ01-µ)*(x1-x)*(y-y0) + f10(q+µ10-µ)*(x-x0)*(y1-y) + f11(q+µ11-µ)*(x-x0)*(y-y0))/(x1-x0)/(y1-y0)
	
	def likelihood(self, x, y, q):
		if not (isinstance(q,int) or isinstance(q,float)):
			raise TypeError(f'<q> must be a number (int or float), received instead <q> of type {type(q)}.')
		check_number_or_numpy_array(x, 'x')
		check_number_or_numpy_array(y, 'y')
		if isinstance(x, float) or isinstance(x, int):
			x = np.array([x])
		if isinstance(y, float) or isinstance(y, int):
			y = np.array([y])
		if x.shape != y.shape:
			raise ValueError(f'The shape of <x> and <y> must be the same. Received x.shape = {x.shape} and y.shape = {y.shape}.')
		ll = []
		for xx,yy in zip(x.ravel(),y.ravel()):
			ll.append(self(q,xx,yy))
		ll = np.reshape(np.array(ll),x.shape)
		if x.shape == 1:
			ll = ll[0]
		return ll
		

if __name__ == '__main__':
	import myplotlib as mpl # https://github.com/SengerM/myplotlib
	import lmfit
	import palettable
	
	def generate_samples(x,y):
		return list(np.random.randn(99999)*(1+x)+y+x) + list((np.random.rand(9999)-.5)*3*(1+x)+y+x)

	X_VALS = [0,1,2,3]
	Y_VALS = [0,1,2,3]

	data = {}
	qmin = 0
	qmax = 0
	for x in X_VALS:
		data[x] = {}
		for y in Y_VALS:
			data[x][y] = generate_samples(x,y)
			qmin = min(qmin, min(data[x][y]))
			qmax = max(qmax, max(data[x][y]))

	epd = ExperimentalParametricDensity(data)
	
	fig = mpl.manager.new(
		title = 'Measured and estimated distributions at each x,y',
		xlabel = 'q',
		ylabel = 'Probability density',
	)
	qaxis = np.linspace(qmin, qmax,99)
	x0 = 1.1
	y0 = 2.6
	n = 0
	for x in X_VALS:
		for y in Y_VALS:
			n += 1
			color = tuple(np.array(palettable.tableau.Tableau_20.colors[n%20])/255)
			fig.hist(
				data[x][y],
				label = f'Measured q at x={x}, y={y}',
				density = True,
				color = color
			)
			fig.plot(
				qaxis,
				epd.kde[x][y](qaxis),
				label = f'KDE at x={x}, y={y}',
				color = color,
				linestyle = '--',
			)
	fig.plot(
		qaxis,
		epd(qaxis, x0, y0),
		label = f'Interpolation example at x={x0}, y={y0}',
	)

	q0 = 6
	xaxis = np.linspace(min(X_VALS), max(X_VALS))
	yaxis = np.linspace(min(Y_VALS), max(Y_VALS))
	xx,yy = np.meshgrid(xaxis,yaxis)
	fig = mpl.manager.new(
		title = f'Likelihood plot',
		xlabel = 'x',
		ylabel = 'y',
		aspect = 'equal',
	)
	fig.contour(
		x = xx,
		y = yy,
		z = epd.likelihood(xx, yy, q0),
		colorscalelabel = f'Likelihood(q={q0}|x,y)',
	)
	
	# Now let's find the maximum of the likelihood:
	params = lmfit.Parameters()
	params.add('x', value = 0, min = min(X_VALS), max = max(X_VALS))
	params.add('y', value = 0, min = min(Y_VALS), max = max(Y_VALS))
	def func2minimize(pars):
		parvals = pars.valuesdict()
		return 1/epd.likelihood(parvals['x'],parvals['y'],q0)
	minimizer = lmfit.Minimizer(
		func2minimize,
		params,
	)
	minimizer_result = minimizer.minimize(method = 'nelder')
	print(minimizer_result.params)
	
	fig.plot(
		[minimizer_result.params['x'].value],
		[minimizer_result.params['y'].value],
		marker = '.',
	)

	mpl.manager.show()

As a result of running this script we obtain the following plots:

First this plot that shows all the “measured” (simulated) distributions as histograms along with the corresponding KDE estimation for this distribution (you can enable/hide traces by clicking in the legend):

and secondly this color map with the likelihood function for the particular realization \( q = q_0 = 6 \) :

The black points are the \( x,y \) points where the distribution of \( q \) was measured (simulated). The red point is the maximum of the likelihood function, this means that \( x_\text{red},y_\text{red} \) is the estimated point when \( q = q_0 = 6 \) according to the maximum likelihood principle.

Of course it is possible to use this likelihood function for whatever we want, for example to construct other estimators such as the likelihood ratio test.

Conclusion

I presented a way to obtain an estimation of the likelihood function, i.e. an experimental likelihood function, for continuous parameters based only in measurements. This method employs a sampling of the distribution function in discrete points for the parameters and then performs a specific interpolation to go from discrete to continuous values.

My next post will show the application of this method to real data coming out of the detectors.