Maddie Smith

I Bought a House! Now how do I transport all my things…?

Maddie Smith — Tue, 20 Apr 2021 12:30:00 +0000

You may have seen on this blog post that my fiancé and I found out in February that we could qualify for a mortgage. Well, fast forward two months, and we have purchased our first home! We are very very excited to move in, and we have already started *borrowing* (stealing) furniture from our relatives (hello grandma’s dining table).

But just the other day, while I was working on a report at my desk, it suddenly dawned on me – how on earth am I going to transport all my belongings to a new home? It should be said, I am a bit of a hoarder, and there is an entire wardrobe of art supplies that my parents are keen for me to take with me when I leave. Add to this my excess of fiction books, cuddly toy collection, and piano, and well… you’ve got a lot of things to transport 15 minutes down the road (you heard that right, I am moving a 15 minute drive away from my childhood home).

It turns out that packing problems actually constitute a well-studied area of optimisation. Often, in places like warehouses, the ability to effectively pack items of various sizes into a finite number of containers is vital to the smooth running of operations. Warehouse packing algorithms attempt to solve these kind of problems, by optimising the packing of items in day to day operations.

So what kind of packing problem is this?

The ‘bin packing problem’ is perhaps the most relevant description of my ‘dear lord how will I transport my cuddly toy collection‘ issue. Essentially, it refers to the problem of how best to pack multiple items of various sizes into a finite number of bins (or containers). Like most optimisation problems, it’s all about trying to find the best possible solution out of all the feasible solutions.

What do you think would give the best solution?

I think most people, when asked for the most optimal way to pack all my cuddly toys up, would suggest trying to fit as many cuddly toys as I can into one bin. And this would be a very good suggestion! This is what I love about operational research, you don’t need to know lots of fancy mathematical theory in order to have a good idea about how to solve a problem.

Generally speaking, the best solution for a bin packing problem can be one of two things:

Packing one container as densely as possible
Packing all objects into the least amount of packages as possible

Both of these solutions have very clear objectives: minimising cost. In some shipping companies, it is common that items will get charged for the amount of space they take up, rather than the weight. Hence, packing a container very densely is the best option.

Also, clearly reducing the amount of packages used reduces packaging costs.

Computational Complexity

if you’ve come across some form of optimisation problem before, you may have heard the term ‘computational complexity’ being thrown about. The computational complexity of a problem may sound scary, but in actual fact it can just be thought of as how many resources are required to solve the problem (maybe the resources are the time taken, or the data storage required).

When you measure how long a program takes to run when it is given more and more difficult problems (for example, packing 10 cuddly toys, then 20 cuddly toys, then 30 cuddly toys etc) you can plot the times and come up with a function.

In the case that the time taken increases exponentially or factorially (or anything that exceeds what a polynomial can do!) as the difficulty of the problem increases, we say that the problem is not solvable in polynomial time.

The bin packing problem is known as an NP-complete problem. This stands for ‘non-deterministic polynomial time’. This means that it is not solvable in polynomial time. In simplest terms, this kind of just means that if we increase the number of items that need to be packed, even by a little amount, then the problem takes wayyyy longer to solve.

A cool fact about NP-complete problems, is that if anyone could ever solve an NP-Complete problem in polynomial time, then we could solve all NP-complete problems in that way using the same method. This would mean that the entire class of NP-complete problems would cease to exist!

Standard Bin Packing Algorithms

So what are the different types of bin packing algorithms that exist?

Next fit: For this bin packing algorithm, I check if an item fits into the box that I am currently filling. If it does fit – great! I will add it to the box. If it doesn’t, I seal up the current box, ready to be transported to the new house, and I begin filling the next one. A benefit of this algorithm is that I can send the sealed boxes over to the new house one after another, so I am never required to pack more than one box at the same time – this helps save bedroom floor space while I pack!

First fit: For this algorithm, let’s imagine I have four boxes lined up on the floor, ready to be packed. Here, I take an item and I see if it will fit in the first box. If I does, I place it in, whereas if it doesn’t, I move onto the next box. I continue this down the row of four boxes, until I find one that works.

Worst fit: Imagine now that I have partially filled several boxes with items. For this algorithm, I will place my current item into the box with the least amount of items (as though I am trying to even the weight between all the boxes). If I have two equally empty boxes, I’ll put the item in the box that I started to fill first.

The above algorithms are known as online algorithms; they are used when items arrive one at a time, in an unknown order. In these algorithms, we must put the current item into a container before we consider the next item. Each algorithm has an associated runtime complexity, describing how the time taken for the algorithm to run scales with the number of items to be packed n. Ideally, we prefer algorithms with a lower runtime complexity, because this means that the algorithm performs quicker if it has to pack lots of items! But remember, we also want algorithms that give us good approximations to optimal solutions, so which you use depends on the situation.

Other algorithms exist for offline situations, where we have all the items available upfront (like my moving house case!). I won’t cover some of these here, but check out the further reading if you’d like to know more.

So there we have it!

I hope you enjoyed this small introduction to optimisation and bin packing algorithms. In reality, I may not use a bin packing algorithm to help pack my belongings in advance of moving day, but in the world of warehouse management, bin packing algorithms are really important. Many companies use 3D bin packing software to help optimise their operations.

There are several variations on general bin packing problems; for example, the classic optimisation problem known as the Knapsack problem. In the Knapsack problem, we have one bin, and our items are characterised by a value and a weight. Our goal is to maximise the value of the items that can fit into the bin.

If you did enjoy this post, and would like to read more, I’ve listed some great resources for further reading below!

on the bin packing problem gives a great introduction, and goes into some more detail on the algorithms.
If you’re interested in actually implementing some of these algorithms, I love , which describes how to code up all the algorithms discussed in this report in several different languages.
is a great, easy to read description of how bin packing algorithms help warehouse operations.

The Social Network – A Super Quick Introduction to Network Modelling

Maddie Smith — Tue, 13 Apr 2021 12:30:00 +0000

I’m sure all of you have heard about networks in one way or another; perhaps you cast your mind instantly to the idea of Facebook friends upon reading this post, and not only thanks to my rip-off of a certain film title. But what are networks actually used for, and what inference can be made from them?

“We assume the data to be independent and identically distributed” – if I received £1 for every time I heard that phrase during my first week in STOR-i, I would have very little need for a stipend. This is because it is common in statistics to be working with independent and identically distributed data – and this means that making inference from the data is nice and easy.

Network data poses more challenges than traditional independent and identically distributed data. One reason for this is that there is a dependent nature to the data. This makes sense; consider a Facebook page called 51�� Ducks, which posts the best pictures around of our campus ducks.

You choose to like this page, as you like receiving all the best duck updates. Then, it is more likely that one of your university friends also likes this page, compared to a random person who is not a member of your network.

Let’s consider another example of a network – a recommendation system. If, like me, you have been binging all the latest Netflix titles over lockdown, then you have probably come into contact with this form of network.

When you finish watching a series in Netflix, you may have noticed that other TV shows are recommended to you. This similar to when you shop online; perhaps you are familiar with the ‘similar shoppers also bought…’ suggestions. This idea can be modelled as a network…

This figure demonstrates a basic recommendation system network. The coloured circles are called nodes, or vertices. In this case, the blue nodes on the left represent users, and the orange nodes on the right represent movies. Nodes can be given any name in a network; here I could have given names of STOR-i lecturers or recent Netflix releases.

The lines linking particular users to particular movies are called edges. It is possible for edges to be directed (usually shown by having an arrow pointing along the edge), or weighted. Weighted edges have a number (or weight) associated with them. In our recommendation system case, a weighted edge could perhaps indicate the number of times a user has watched a particular movie.

Using our network, we can see that User 1 watched movies 1, 2 and 4, meanwhile User 2 watched movies 2 and 3, and so on. Now, imagine that a fourth user joins our network. User 4, our new user, is the same age as User 2, and both users live in the UK. Therefore, a recommendation system might want to suggest that User 4 also watched movies 2 and 3.

The degree of a node is the number of edges connecting to that node. Looking back at our network, we can see that the degree of the User 1 node is given by 3, and there are three edges connected to this node. The degree of the User 2 node is given by 2, and so on.

Using the degree of the nodes in a network, it is then possible to calculate a degree distribution. The degree distribution for a network denotes the proportion of nodes with a specific degree, and can be used to compare network models to real networks.

There you have it – a super quick introduction to networks! Can you think of any other situations which could be modelled as networks? What about directed networks? Let me know in the comments!

If you are interested in reading more about networks, and finding out some network models that exist, then make sure you check out the further reading for this blog post!

Optimising Visual Art

Maddie Smith — Tue, 30 Mar 2021 12:30:00 +0000

I am one of those people who enjoys everything. And I mean everything. You probably know one of those people, or maybe you are one. The people who try a hobby once, become instantly hooked, and then want to buy all the equipment and become a master of this new skill – well, that’s me. Bread making, calligraphy, skiing, candle making, language learning, tennis, writing, figure skating, piano playing… you get the idea.

This also extends into my academic career; you may have read on my home page that my undergraduate degree was in Theoretical Physics, but now I study Statistics and Operational research (check out this post about why I made the switch). Basically, I love to learn, and I love to discover as much as I can about the things I find interesting; hence my love of research.

One thing that has remained a constant love throughout my life though is art. I love to paint (in water colours, acrylics, dyes and oils – I told you, I enjoy everything), and also draw. I recently read a blog post by my friend Lídia, on the role of Operational Research in music – a truly interesting post that you can find here – and it got me thinking, I wonder if there was a link between Operational Research and art too?

Optimisation

Optimisation is a branch of Operational Research which considers the task of achieving optimal performance subject to certain constraints. Mathematically, optimisation problems deal with maximising or minimising some function with respective to some set; for example, given a set of decision variables x = (x₁, x₂, x₃, …, x_n), we want to find the decision variable that maximises or minimises the objective function.

The objective function could be the waiting time for customers in a system, or the profit from the sale of a product.

The subset which represents the allowable choices of decision variables is called the feasible set. Constraints in an optimisation problem work to specify what the feasible set is. For example, if we consider the waiting time for customers in a system, a constraint would be given by the fact the number of customers in a system can never be negative. This is a simple example of a constraint that one would encounter in an optimisation problem, but in reality there are often many constraints, that can be complex and highly dimensional.

So how does this relate to art?

Well, clearly when an artist works, they are subject to some real-world constraints. They are required to work within budgets, meet deadlines, and follow the instructions of the customer in the case of commissions.

But, perhaps more interestingly, some artists subject themselves to constraints voluntarily. A Sunday on La Grande Jatte (1884) is a work by the French artist Georges-Pierre Seurat (see on ). Looking at it, the painting depicts Parisians having what looks to be a lovely day at the park. When viewed up close, however, it is possible to see that the painting is made up of many tiny dots of multicoloured paint.

If we think about the creation of this painting as an optimisation problem, we could say that Seurat’s objective was to create the best possible depiction of what he saw, subject to two key constraints: only applying the paint in tiny dots, and keeping his colours separate.

I came across , which discusses applying optimisation algorithms in order to create computer generated artwork. This general idea is considered for three different branches of art: line art, tile-based methods, and freeform arrangement of elements. For this post, we’re going to be considering tile-based methods, or mosaics. See .

When photographs or other images are represented digitally, we can think of this as a function which maps each pixel location (x, y) to some colour space. This is easy to consider in a mathematical context, for colours can be denoted using an RGB colour model or similar. Once photographs are represented in this form, it is easy for artists to construct a new artistic version of this photograph by replacing each pixel or block of pixels with some object, known as tiles.

The need for optimisation becomes apparent when an artist is choosing from a finite selection of tiles (where tiles could be anything ranging from dominoes, lego pieces, or other photographs). If the artist only has access to a fixed selection of tiles, then the goal is to find the ‘best’ overall arrangement of this set of tiles, in order to produce the most aesthetically pleasing artwork. Mathematically, the most ‘aesthetically pleasing’ artwork may be considered as the artwork which is most similar to the original image.

If we let I be a W x H image, and suppose that we have an inventory of tiles T = {T₁, T₂, …, T_n}. So how do we quantify how similar an approximation is to the original reference image? In the paper, a distance function d(I(x,y),T) is introduced, where I(x, y) is the colour of the pixel at location (x, y) in the original image, andT_j is the colour of a particular tile. This distance function provides a quantitative method for determining how effectively a given tile approximates a pixel in the source image.

We assume that d(I(x, y),T) is never negative, and smaller values denote better approximations (a distance value of zero would indicate that the tile and the corresponding source pixel are the same colour). If we consider all possible ways in which our inventory of tiles may be arranged on our image (where each tile is used no more than once), that is, all possible mappings, we then seek to find the mapping which minimises the total of the distance functions for every pixel.

In the case that our tiles match up with the dimensions of the pixels exactly, this proves to be a relatively simple task. It is possible to solve this by constructing a complete, weighted bipartite graph, in which each pixel location (x, y) is connected to every tile T_j by an edge of weight d(I(x, y),T_j). The minimum weight matching which uses all pixels can then be computed to give the optimal solution.

This task becomes far more complex in the case that the tiles do not match up one to one with the image pixels, for example, if the artist were to use something like dominoes. However, it is presented in the paper, that the construction of domino art can be naturally reduced to an problem.

Isn’t it fascinating how optimisation techniques can be used to create impressive artwork? I certainly think so. Some people might think that applying mathematical techniques such as integer programming to create art takes away from the spontaneity and skill that goes into creating a masterpiece, but I actually think it only adds to the awe! What do you think? Let me know in the comments.

Make sure to check out the further reading if you are interested in finding out more!

Why I Chose a PhD in Statistics and Operational Research

Maddie Smith — Tue, 16 Mar 2021 12:30:00 +0000

Today’s post is something a bit different, discussing why I decided to pursue a PhD in Statistics and Operational Research, particularly after completing a four year integrated Masters degree in Theoretical Physics with Mathematics. This is a personal post (no mathematical content!), and I hope it will be helpful for those of you who are perhaps completing your A levels in the mathematical sciences currently, or maybe you are an undergraduate student considering postgraduate study!

When it came to applying for my undergraduate degree, I didn’t really know what I wanted to study. In fact, I had the most bizarre personal statement that I have ever seen. The first half detailed my love of Maths, touching on my interest in Physics, meanwhile the second half discussed my love of languages and Spanish. I applied for vastly different courses at four different universities, ranging from Spanish and Italian to AstroPhysics to Mathematics.

At around Easter time during my final year of college, I had what you might call an ‘A-ha!’ moment, when I stumbled upon the Theoretical Physics with Mathematics course at Lancaster. This was exactly what I was looking for at the time, a combination of applying difficult mathematical theory to tackle real problems in physics. The admissions team at Lancaster were kind enough to set me up with an interview with the Physics department, and I ended up accepting an unconditional offer to study a three year BSc course.

Fast forward two years, and I was preparing for my second year university exams. By this point I had developed a group of friends who were enrolled on the four year integrated masters course, and they already had plans to pursue PhDs once our undergraduate degrees were completed (if you’re interested in what they’re doing now, check out their research groups: , , and ). The thought of studying for a masters, integrated or not, had never really occurred to me at the point of applying to university – the thought of studying for a PhD even less so. But as my second year progressed, I must admit that I became enthralled by the world of academia and research. When you come from a background with no exposure to further education, coming into contact with Drs and Professors and research papers carries some sense of excitement with it – at least, it did for me. These people aren’t just relaying facts on a subject, they are actively discovering new things about the subject. They are experts.

During the second term of my second year, I switched onto the four year integrated masters course. It was occurring to me just how vast the subjects of mathematics and physics were, and I was acutely aware that the lecture courses I had taken barely scratched the surface on some seriously complex subjects. In short, I wanted to know more.

As time went on, I began thinking about what I wanted to do once my undergraduate degree was over. At the end of my second year of university, I received a job offer following a summer internship.

However, by this point I was considering postgraduate study. I wanted to explore other areas of mathematics, and perhaps move into a new and exciting area. I had a friend who was studying a BSc in Statistics at the time, planning to go on to complete a Masters at Imperial College London and a subsequent PhD in the subject. I also had another friend who was completing an internship with an exciting sounding group called STOR-i that summer. I began to take an interest in statistics and operational research alongside my courses in physics and pure mathematics.

I began looking into STOR-i myself shortly after, upon beginning my third year of university. Despite being a student at 51�� already, I actually hadn’t heard of STOR-i until my friend completed his internship there. I think sometimes in academia, there is this assumption amongst students that you must know exactly what area you want to pursue research in; there is this unhealthy expectation that you must already be an expert in a subject before you decide to undertake a Masters degree or a PhD in it. This was not my situation at all.

Upon looking on the STOR-i website, I guess you could say that I had an ‘A-ha!’ moment similar to when I found my undergraduate degree three years earlier. Industry, maths, problem solving – hello. I sometimes felt that the academic work carried out in physics and pure mathematics was a bit too far removed from real life, so the idea of working with industry partners in order to solve actual, current problems was something that really excited me. STOR-i provided a programme that would allow me to apply my mathematical skillset to something new, and perhaps cause me to think about the maths that I had learned previously in a new way.

At the end of my third year of university, I was offered another graduate job; the accompanying salary was higher than that offered to me the previous year. I feel it should’ve been a difficult decision, deciding whether to take the job or pursue further studies when I finished my undergraduate degree – but it really wasn’t. I had my mind set on STOR-i. The potential to study two entirely new fields was exciting to me. The opportunity to work with industry partners offered the perfect preparation for the sort of career I wanted to have. The chance to carry out my own research and become one of those experts that I admired was even more appealing.

I declined the job offer – a rather bold move seeing as I was yet to even apply for (let alone receive!) a position at STOR-i. Now, almost two years later, I’m happy to say that I made the right decision. Since starting with STOR-i in October, I feel as though every day I have either learnt something incredibly interesting, or incredibly useful; on the lucky days, I feel as though it is both. The PhD projects have recently been released, and I’m pleased to say that they are exactly the reason I joined STOR-i in the first place – relevant, current applications, with a chance for me to get stuck in with some difficult problems. I guess the main thing that I am excited for though, is all the new skills that I am learning and am yet to learn. In six months, I have gone from being a complete novice in two fields to being able to tackle problems and understand whole new areas of mathematics – and I think that is the main reason that I chose Statistics and Operational Research.

Ch Ch Ch Ch Changepoints

Maddie Smith — Tue, 02 Mar 2021 12:30:00 +0000

No, I didn’t just forget the words to David Bowie’s Changes, in today’s post we’re going to be talking about changepoints! In this brief introduction to changepoint analysis we’ll be covering what is actually is, how is it useful and when can we apply it. At the end of this post, I’ll also be sharing some code resources, which you can use to carry out your own changepoint analysis!

Changepoint analysis is a really well-established area of Statistics. It dates back as early as the 1950s, and since then has been the focus for LOTS of interesting and important research.

Changepoint detection looks at time series data. A time series is a series of data points which are indexed in time order. Usually, a time series is a sequence of discrete measurements, taken at equally spaced points in time. This could be the number of viewers for a particular TV show taken at one minute intervals over the course of an hour, or maybe the heights of ocean tides taken every hour throughout the day.

As the name suggests, the aim of changepoint detection is to identify the points in time at which the probability distribution of a time series changes. We can think of this as follows:

Let’s say we have some time series data given by y_1, y₂, …, y_n, where y_i is the measurement taken at time i. Then, if a changepoint exists at time τ, this means that the measurements y_1, y₂, …, y_τ differ from the measurements y_τ+1, …, y_τ in some way.

If we are performing a changepoint analysis, there are some key questions that we’d like to consider:

Has a change occurred?
If yes, where is the change?
What is the probability that a change has occurred?
How certain are we of the location of the changepoint?
What is the statistical nature of this change?

Online v Offline Detection

Changepoint detection can either be online or offline. Imagine that we have access to some data, which describes the temperature taken at 51�� at 12pm everyday over the course of a month. We then want to look for changepoints in this data, to see whether there were any freak increases or dips in the mean temperature, or maybe periods with very high variance. This type of analysis would require offline changepoint detection methods, because we have access to the complete time series data. That is, we are looking at the data after all the data has been collected.

On the other hand, imagine that The Great British Bake Off is on TV right now. The number of viewers tuned in for the programme is being streamed to us live every second, and we want to look for changepoints in the number of viewers now, as the programme is being aired. This type of analysis would require us to use online changepoint dection methods, which run concurrently with the process that they are monitoring.

Let’s recap that. In offline changepoint detection …

Live streaming data is not used.
The complete time series is required for statistical analysis.
All data is received and processed at the same time.
We are interested in detecting all changes in the data, and not just the most recent.
We usually end up with more accurate results, as the entire time series has been analysed.

Whereas in online changepoint detection …

The algorithm runs concurrently with the process that it is monitoring.
Each data point is processed as it becomes available.
Speed is of the essence! The goal is to detect a changepoint as soon as possible after it occurs, ideally before the arrival of the next data point!

Examples

Let’s consider a fitness tracker that can tell when you are walking, running, climbing stairs … you get the idea. Maybe your mobile phone does this. One way in which devices can tell what activity you were performing at a particular point during the day is by using offline changepoint detection!

Online changepoint detection is often used in areas like quality control, or for monitoring systems. For example, a broadband provider might receive live data that details the performance of their broadband network at some site. Detection of a changepoint in this scenario might indicate that there is an issue with the network! This brings us to another required feature for a good online changepoint detection method: alongside the need for speed, it is also important that we have a method that is robust to noise, false positives and outliers. This makes sense, as the broadband provider doesn’t want to send out an engineer if there isn’t actually anything wrong with the network!

Now that we have covered what changepoint detection is, and the differences between offline and online detection methods, can you think of any other scenarios where we would want to use offline changepoint detection methods? What about online detection methods?

Interior Design and Hypothesis Testing

Maddie Smith — Tue, 16 Feb 2021 12:30:00 +0000

Just the other week, my fiancé and I were told we were able to qualify for a mortgage. As you can expect, I’ve spent the following days excitedly searching for properties online and dreaming up interior design schemes. I’m sure most people would agree that the thought of decorating your first home is a thrilling but mildly terrifying task, as up until now all the bad interior design decisions in your home could be blamed on your parents’ poor taste.

While perusing the internet for living room paint colours, I came across the statement of ‘blue is a calming colour’.

This got me thinking, who comes up with this information? Is this just a clever marketing technique designed to encourage me to paint my entire house blue (because let’s face it, who doesn’t need a bit of calming in the midst of a global pandemic)? Or is there actually some truth to this statement? A way of testing whether this statement is likely to be true would be to use hypothesis testing.

Hypothesis testing is a statistical method that is used to determine how likely or unlikely a hypothesis is for a given sample of data.

In this post, I give a very simple introduction to hypothesis testing for those of you who may not have come across it before. I try to keep things simple, so if you want a bit more information (particularly on test statistics), I’ve left some great further reading resources at the bottom!

Let’s say that we have access to some data that was gathered to determine whether or not people find the colour blue calming.

The data we have corresponds to the following experiment: 100 people were asked to fill in a survey about how they were feeling. 50 of these people carried out the survey in a blue room, and the other 50 carried out the survey in a white room. The possible survey responses were given by calm and normal.

Let’s assume that people in the blue room have some probability p₁ of choosing the calm answer, while the probability of people in the white room choosing this answer is given by some probability p₂.

We can now begin our hypothesis test!

In hypothesis testing, the null hypothesis H₀ describes the case that the sample observations result purely from chance. In our case, it would mean that we’d expect to see the same proportion of people feel calm in the blue room as in the white room. Looking at our probabilities, we could say the null hypothesis is given by: H₀ : p₁ = p_2.

On the other hand, the alternative hypothesis H_A describes the case that the sample observations are influenced by some non-random cause. In our example, this corresponds to the people in the blue room having a different probability of feeling calm than those in the white room: H_A : p₁≠ p₂.

The general idea with hypothesis testing is that we look to see if our data provide evidence to reject H₀. This is done by calculating something called a test statistic, and then looking at the probability of observing this test statistic in the case that our null hypothesis is true.

In order to see whether or not the value indicated by the null hypothesis is supported by the data, we need to set a significance level α for our hypothesis test. This is the probability that we incorrectly decide to reject the null hypothesis in the case that it is actually true! Of course, we want this to be small, so it’s usually set at 5%.

Some more definitions…

A test statistic T is a function of the data whose value we use to test a null hypothesis. It shows us how closely the data observed in our sample match the distribution that we’d expect to see if the null hypothesis were true.

The p-value of a test is the probability of observing a test statistic at least as extreme as observed, if the null hypothesis is true. This means that small p-values offer evidence against H₀, because it is saying that if the null hypothesis is true, then it is very unlikely that we would’ve seen this result. Make sense?

Don’t worry if it doesn’t! If you’re new to hypothesis testing, it can be quite difficult to wrap your head around.

Let’s pause for a moment and think about what we would do in order to test our question of “Is blue a calming colour?”.

Define our null hypothesis – “The colour blue has no effect on how calm a person feels. Or, in other words, the probability of a person choosing calm is the same, whether they are in the blue room or the white room.”
Set our significance level – This is the probability of rejecting our null hypothesis when it is actually true. We obviously want this to be small, so α=0.05 is a good choice.
Construct a test statistic – It’s up to you to choose what you would like to use as a test statistic. Basically, it is a function of the data that we can calculate to give a number. This could be the
Calculate the p-value – This is the probability that we would’ve obtained our test statistic value if the null hypothesis is true.
If the p-value is less than our significance level α, reject our null hypothesis – We can now say that “Blue is a calming colour!”
If the p-value is greater than our significance level α, do not reject our null hypothesis – “We still don’t know if blue is a calming colour.”

Note that Step 6 says “do not reject our null hypothesis” and not “accept the null hypothesis”. This is important: failing to reject the null hypothesis just means that we did not provide sufficient evidence to conclude that blue is a calming colour; in other words, it still might be! But we don’t have enough evidence to say this.

So there you have it, a brief introduction to hypothesis testing! I hope you enjoyed this post and found it useful. If you want to know more about hypothesis testing, be sure to check out the further reading on this post!

Learn From Your Mistakes – Multi-armed Bandits

Maddie Smith — Tue, 02 Feb 2021 12:30:00 +0000

In a recent talk given to the MRes students, I was asked for my opinion on a multi-armed bandit problem. In these working from home times, I’m sure most of us know of the combined dread and panic that comes with taking your microphone off mute to speak on a call. I contemplated the question, and then gave my answer. As you might have guessed from the title of this talk, I was wrong. But I certainly wouldn’t get this problem wrong again, because I had learned from my mistake.

Ironically enough, learning from your mistakes, or past behaviour, is an idea that is strongly rooted in the multi-armed bandit problem. And thus, a blog post was inspired!

Multi-armed bandits

So, let’s get started. What exactly is a multi-armed bandit?

When most people hear ‘multi-armed bandit’, they may think of gambling. That is because a one-armed bandit is a machine where you can pull a lever, with the hopes of winning a prize. But of course, you may not win anything at all. It is this idea which constitutes a multi-armed bandit problem.

Mutli-armed bandit problems are a class of problems where we can associate a particular score with each of our decisions at each point in time. This score includes the immediate benefit or cost of making that decision, plus some future benefit.

Imagine that we have K independent one-armed bandits, and we get to play one of these bandits at each time t = 0, 1, 2, 3, …. These are very simple bandits, where we either win or lose upon pulling the arm. We’ll define losing as simply winning nothing.

Now, if your win occurs at some time t, then you gain some discounted reward a^t, where 0 < a < 1. Clearly, rewards are discounted over time; this means that a reward in the future is worth less to you than a reward now. The mathematically minded among you may realise that this means that the total reward we could possibly earn is bounded.

The probability of a success upon pulling bandit i is unknown, and denoted by π_i. Since these success probabilities are unknown, we have to learn about what the π_is are and profit from them as we go. This means that, in the early stages we have to pull some of the arms just to see what π_i might be like.

At each time t, our choice of which bandit to play is informed by each bandit’s record of successes and failures to date. For example, if I know that bandit 2 has given me a success every time I pulled it, I might be inclined to pull bandit 2 again. On the other hand, if bandit 4 has given me a failure most of the times, I might want to avoid this bandit. Thus, we are using previous data which we have obtained about each of the bandits in order to update our beliefs about the bandits’ probability distributions.

The Maths

Updating our beliefs about the probability distributions of the bandits in this way is using an interpretation of statistics called Bayesian. Let’s imagine that we have a parameter that we want to determine (in our case, the probability of success for each of the K bandits). Maybe we have some prior ideas about what the probability of success for each bandit will be. This could be due to previous experiments you know about, or maybe just personal beliefs. These prior beliefs are described by the prior distributions for the parameters.

Then, let’s imagine we begin our bandit experiment. After time t, we have pulled t bandits, and we now have information detailing the number of successes and failures for each of the bandits pulled. At this time, we take this observed data into account in order to modify what we think the probability distributions for the parameters look like. These updated distributions are called the posterior distributions.

The Question…

How do we play the one-armed bandits in order to maximise the total gain?

Well, this is where the importance of learning from your mistakes comes in. Imagine the case where K = 5, and at time t = 7 our observed data are as follows:

Looking at our observed data, bandit 1 has the highest proportion of successes out of the number of times it was played. So maybe we would want to pull Bandit 1 again at our next time?

Well, maybe not.

Notice that Bandits 2 and 5 haven’t been played yet at all; therefore we can’t really infer any data about them. Pulling Bandit 1 might give us a success on our next attempt, but it could also give us a fail. We know nothing about Bandits 2 and 5, perhaps these bandits have a probability of success of 1?

The idea of pulling the best bandits to maximise your expected number of successes relies on a balance between exploration and exploitation. Exploration refers to playing bandits that we don’t know much about, in this case, pulling arms 2 and 5. Exploitation, on the other hand, means pulling the arms of bandits that we already know might give us a good result; in our case, pulling bandit 1 again.

Every time we pull an arm, we are actually receiving both a present benefit and a future benefit. The present benefit is what we win, and the future benefit is what we learn. Recall that at time t, if we win, we receive a discounted reward of a^t. Therefore, the closer a is to 1 determines how important it is to learn about the future. For example, if a is closer to one, many future pulls will make a big difference. On the other hand, if a is closer to 0, our present benefit is more important. Again, this relates to the importance of balancing exploration and exploitation.

So how do we quantify both the immediate benefit and the future benefit from these bandits, in such a way so that we can play the bandits to maximise our total gain? It is possible to take the posterior distribution for each bandit at each point in time, and associate a score with it that represents both future and present benefit of pulling that bandit.

It turns out that there is something called a Gittins Index, which is a value that can be assigned to every bandit. By looking at the Gittins Index for all K bandits at time t, it is Bayes optimal to play any bandit which has the largest Gittins index, which is pretty cool!

So there we have it, learning from past experiences is critical in the world of multi-armed bandits, and also in many other areas of mathematics including reinforcement learning and bayesian optimisation. The trade off between exploration and exploitation is critical, and perhaps even serves as a reminder to the importance of not being afraid to make mistakes in our everyday lives… just a thought.

If you enjoyed this post, be sure to check out the following further reading resources!

Most Likely Paths in the Ocean – Current Research from STOR-i

Maddie Smith — Tue, 19 Jan 2021 12:35:00 +0000

“So, what do you actually do?” – The question that many postgraduate students are fed up of hearing. If, like me, you hail from a maths averse family, you will be familiar with trying to explain what it is you actually do all day to your nearest and dearest. So, what is the answer? Well, in truth, a lot of tea drinking and biscuit eating, marvelling at lecturer’s cats over Teams calls and frequent discussions about Bake Off. But that’s not just what we do.

In fact, entering the fields of statistics and operational research from my background in physics, I would be lying if I said that I had known exactly what to expect myself. My undergraduate degree is in Theoretical Physics, the key part here being theoretical – to be honest, I had very little idea what research I would be a part of when I joined STOR-i.

I plan to write a series of blog posts that cover some cool and unexpected areas of research and uses for statistics and operational research that you may not know about. Today’s blog post is the first of said posts, and we are going to be considering the question: what is the most likely path that a particle would take between two points on the surface of the ocean?

This research question was considered in a recent talk given to the MRes students by Dr Adam M. Sykulski, and covered the work of his PhD student Michael O’Malley. I’ll be using some of the figures from the presentation in this post, so be sure to for more reading.

Let’s get started…

So, what do you suppose the most likely path is from point A to point B in the ocean? Perhaps you suppose that the most likely path is along a geodesic line (geodesics describe the shortest route between two points on the Earth’s surface, see for more information); a perfectly reasonable guess. Or maybe you think that the most likely path would follow some ocean currents.

The Data …

consists of approximately 1300 satellite-tracked drifting buoys, where the buoys have been designed to mimic the motion of a water particle in the ocean. This provides an ideal data set in order to carry out research on the most likely path!

But finding the most likely path between two specific points is not as simple as monitoring a buoys movement from one point to the other – in fact, the chances that a buoy will have travelled through those two points is unlikely. Instead, the ocean is discretised into spatial bins (this is just a fancy way of saying chopped up into equally sized sections).

Most standard approaches to do this use latitude and longitude binning to tesselate the globe (ie, a one degree by one degree map is taken). While this is an intuitive idea, there are problems. Due to the fact that the Earth is a sphere, taking one degree by one degree bins does not result in all bins being of the same area – the bins near the poles will be much smaller. In addition, tessellating the globe into squares means that the diagonals only share one vertex and no edge, creating an asymmetry in the tessellation.

O’Malley and Sykulski proposed a new method of tessellating the globe using Uber’s H3 index, where the globe is tessellated into pentagons or hexagons. This helps to eliminate some of the problems experienced when using standard degree binning, as now each hexagon (or pentagon) is approximately equal in area. In addition, each polygon now shares an edge and two vertices with each neighbour. It is then possible to use this tessellation to form , based on drifter locations from one time step to the next.

The Maths …

Imagine that we have a system, where at each positive integer time point, the system transitions into a new state. Such a system is Markov if the next state only depends on the previous state; this is commonly referred to as the lack of memory property.

This system is what we call a time-homogenous Markov chain. We can describe the stochastic dynamics of such a process using transition probabilitiesp_ij, which describe the probability of transitioning from state i to state j. It is then possible to write a square matrix of one-step transition probabilities, where p₀₁ denotes the probability of transitioning from state 0 to state 1 and so on. This is what we call the Markov Transition Matrix.

Back to our ocean path example. Perhaps you have noticed that in order to use the tessellation of the globe to form Markov transition matrices based on drifter locations from one time step to the next, we must choose an appropriate time step. This time step should be chosen so that the drifter motions obey the lack of memory property that we talked about before. In their research, O’Malley and Sykulski set this to 5 days.

If we now consider the points A and B as belonging to two different states, it is possible to then apply a path finding algorithm in order to find the most likely path between points. O’Malley and Sykulski’s research showed some interesting results! The most likely path from point 1 to point 2 seems to be travelling along the South Equatorial Current, which might have been what you would’ve expected after seeing the current map at the beginning of this post. Interestingly though, the path between 2 and 1 hugs the coastline quite tightly, which is due to the Equatorial Counter Current. This behaviour might not have been expected from the current map!

As you can see, the researchers at STOR-i are actually getting up to some pretty interesting things, alongside the regular cat marvelling and biscuit eating. In fact, this is not the only example of the tools of statistics and operational research being applied to oceanography at STOR-i, alongside many many other things!

I hope you enjoyed this first instalment on my blog; be sure to check out the paper and further reading below if you found this topic interesting, and don’t forget to check back on the blog over the coming weeks!

Maddie Smith

I Bought a House! Now how do I transport all my things…?

So what kind of packing problem is this?

Computational Complexity

Standard Bin Packing Algorithms

So there we have it!

The Social Network – A Super Quick Introduction to Network Modelling

Further reading

Optimising Visual Art

Optimisation

So how does this relate to art?

Further Reading

Why I Chose a PhD in Statistics and Operational Research

Ch Ch Ch Ch Changepoints

Online v Offline Detection

Examples

Further Reading

Interior Design and Hypothesis Testing

Further reading …

Learn From Your Mistakes – Multi-armed Bandits

Multi-armed bandits

The Maths

The Question…

Further Reading

Most Likely Paths in the Ocean – Current Research from STOR-i

Let’s get started…

The Data …

The Maths …

Further reading