WTF is an OSARO?

As you may or may not know, and may or may not be interested in, I work at an artificial intelligence startup called OSARO. We use machine learning to develop intelligent software for applications in industrial automation. Our software, when installed on a physical system like a factory robot, learns to perform tasks that have thus far been too difficult to program using traditional software engineering methods. Essentially, we build brains for factory robots that until recently have had very little of them.

What you are less likely to know, and even less likely to be interested in, is what our company name means. This post is for those of you that are intrigued—both of you. OSARO is actually an acronym, each letter of which corresponds to a component of a continuous process of learning and decision-making that nearly every living organism with a nervous system, including you, engages in throughout the entirety of its life. This process is called reinforcement learning, or, more colloquially, trial-and-error learning.

In a nutshell, reinforcement learning consists of trying something out and seeing what happens. If the outcome is good, do it again. If the outcome is bad, try something else. If you encounter a situation similar to one you’ve been in before, do the thing that felt good, not the thing that felt bad. This is a gross oversimplification, but at its heart this principle describes how most living things that move in the world make their decisions. OSARO’s mission is to design software that mimics this process of learning and decision-making to produce robotic systems that exhibit intelligent and robust behavior.

So what does this have to do with OSARO’s name? The learning and decision-making process for systems that employ the reinforcement learning framework is often represented abstractly by a sequence of steps that repeat ad infinitum. OSARO’s name is an abbreviation of the components that comprise each step of that sequence: Observation, State, Action, Reward, Observation. OSARO. What does each of these components signify, and how does each relate to producing intelligent behavior?

Observation

You experience the world around you as a rich array of sensations that begin as signals generated by your sensory organs—eyes, ears, skin, etc. Everything you are aware of (and much you are not aware of) is a summary of the data entering your body from the neurons in these organs. These sensory neurons pass information to your brain about the stimuli acting on them—light on your retina, changes in air pressure on your eardrum, temperature and pressure on your skin, tension in your muscles, etc. We collectively call these raw data observations. Without these data, your brain would have no information to process; there would be nothing to summarize, no world to make sense of. You’d be a brain in a jar. It would be very boring.

Your brain is continuously processing signals from billions of these sensory neurons hundreds of times per second and trying to make sense of it all. This is big data—very high-dimensional and very noisy—far too much make decisions based on the raw data themselves. The world is too complex and unpredictable at this level of understanding, akin to drinking from a firehose. To make decisions quickly, you need some way of throwing away the irrelevant details, keeping the relevant ones and summarizing them concisely. Your brain must turn these fast-changing, highly-detailed data (e.g., light intensities signaled by the cells in your retina that change every few milliseconds) into slow-changing, high-level data that are easier to reason about (e.g., seeing a fly landing on your sandwich). In short, you need state.

State

All intelligent entities need some way to concisely represent the complex world they live in. In the same way that we can represent numbers in many different ways (counting beads, hash marks, Roman numerals, decimals), there are innumerable ways to summarize and represent the torrent of data (observations) coming from your sensory organs. Some representations are more useful than others, depending on your objectives. Evolution has selected for representations that can summarize sensory data very quickly in ways that improve an organism’s chances of survival and reproduction. Brains are thus, in part, data compression machines—naturally evolved summarizers.

The complex representations that humans and animals form of the world around them are determined in part by genetics, which determine the physical structure of the brain and its early neuronal connections, and in part by experience, which modify those connections in response to sensory data. We call the latter modification learning. An infant has a primitive and not especially useful representation of the world because it hasn’t yet experienced enough data to have learned how to summarize them usefully. It must rely only on innate mechanisms of compression that evolved to function right out of the womb, and these are limited in scope. During the first few months of life, more sophisticated representations slowly take form out of the “blooming, buzzing confusion” of data that initially seem to have no coherent structure. While it’s difficult to imagine as an adult what it feels like to be an infant that can’t make sense of what it’s looking at, an example may help illustrate the point.

Take a quick look at the picture below. If you’ve seen it before, bear with me. If you haven’t, don’t look too hard at it just yet. Just glance at it. Do you see anything other than a bunch of black splotches on a white background? If not, that’s great; this example will be even more compelling to you. Before you look more closely, try to hold in your mind what it feels like not to recognize what’s in the picture—how it feels to just see splotches.

dalmatian

Ok. Now look closer. Do you see any common objects you can identify? Some patterns that look roughly like something you’ve seen before? Blurring your vision a little may help. How about the lower part of a big tree in the upper left, casting a large shadow on the ground below it? And what about in the middle of the scene? Do you see a dog, facing away from you? A Dalmatian perhaps? With its head lowered and sniffing the ground? Keep looking and you’ll eventually see it. If not, search “hidden dalmatian” in Google images and you’ll find some versions with the dog highlighted.

Now think about the moment when you recognized the dog. Did it feel like it sort of “snapped” into place? An “aha” moment? What happened there? The image didn’t change. Nothing got uncovered in the image or came into focus on your retina that wasn’t there before you recognized the dog.  Your brain was processing the same pattern of information coming in through your retina, the same observations, trying desperately to make sense of those black splotches. Initially, it put forth the hypothesis that it really was just a picture of black splotches with no inherent meaning. But after some time it came up with another hypothesis that made more sense given your prior experiences with dogs and trees, and was still consistent with the data (your observations). You understood what you were looking at. Your state changed.

What’s even more compelling about the hypothesis your brain just came up with is just how strongly it will hold onto it once confirmed. Remember when I asked you to try to remember what it felt like not to see the dog? Try to go back to that mental state. Try to look at the image now and not see the dog. I’m willing to bet you can’t. You can remember that you couldn’t see it before, but you can’t not see it now.

This is a contrived example to make you consciously aware of perceptual changes of state, but the reality is that this is what your brain is doing all of the time. This process is just what it normally feels like to observe the world around you and understand it. You just don’t notice it because your brain is so good at it, and things in the real world are usually not so ambiguous—though indeed this is because your brain has learned to become very good at disambiguating things that you see often.

For the first few weeks of life, newborns likely interpret their surroundings in a manner similar to the way you first interpreted the image above—lots of splotchy patterns moving around their visual field, with no meaning or structure to any of it. Over time, as they are presented with more data from their environment, abstract representations of these data that are consistent with the world’s inherent structure slowly take form and become “obvious”.

The problem the brain is solving when forming these abstract representations is known as representation learning. There are many theories about exactly how the brain does this, but we are far from a comprehensive answer. Because of this lack of understanding, it’s also very difficult to reverse engineer an artificial system that solves this problem, despite massive research efforts in artificial intelligence over the last 70 years. We’re making slow but steady progress, however. Recent advances in a class of approaches based on neural network models, collectively termed “deep learning”, have produced several impressive results in just the past few years. There is still a long road ahead before we’ll be able to design systems that exhibit general intelligence in the way that humans do. In the meantime we can leverage these intermediate solutions to incrementally improve the capabilities and robustness of robotic systems—precisely the approach we take at OSARO.

As interesting as the problem of representation learning is, it’s even more interesting to ask why our brains should even bother to solve it. What’s the point of state? Why bother compressing the data we observe? Why is it important to summarize the complicated light patterns on your retina as “a fly landing on my sandwich”?  The answer is provided by our next component of interest.

Action

We wouldn’t bother to try to predict and understand the world around us if we couldn’t do anything with that understanding. We predict, we understand, so that we can act. We act in order to change our environment to improve our situation—that nasty fly on your sandwich must be swatted away. Action is the raison d’être of intelligence.

Consider living things that don’t act—such as plants. We generally don’t consider them to display much intelligence. Although plants have some physiological mechanisms for adapting to their environment minimally over long time scales, they generally can’t move in the sense that we commonly think of movement, and consequently don’t have nervous systems. There’s no reason for them to summarize their surroundings because there is nothing they can do with that information. Consider an even more curious organism, the sea squirt.

sea_squirt_larva
sea squirt

Its early life is spent as a free-swimming, tadpole-like larva, as shown in the upper image above. In this form, it has a primitive nervous system connected to a single eye and a long tail. During this phase of its life, it needs to navigate its watery environment searching for food and a place to settle for the second phase of its life. It thus needs to take in sensory information, process it, and do something appropriate (e.g., move toward food). Once it reaches its second phase of life, it attaches itself to a hard surface and becomes a filter feeder, destined never to move again. In this form, as shown in the lower image above, it simply consumes whatever food particles happen to float into its mouth. At this point, it promptly digests its brain and most of its nervous system, since they are now nothing but a drain on precious resources. Waste not, want not.

These examples highlight the fact that all of the work our brain does to learn a representation of our environment is in service of predicting what we will likely observe next—including the consequences of our own actions—so that we can act accordingly to influence our surroundings. The better and further into the future we can predict, the greater the level of control we have over our environment, and the easier it is to improve our situation. What differs between species of animals is the sophistication of those prediction mechanisms and how they are tailored to the body of the organism—the ways that it can sense its environment and affect its surroundings.

Now that we know why our brains try to summarize the world in ways that let us do useful things, it’s natural to ask what we should do at any given moment.  Why choose any one action over another?  What does it mean to improve our situation? This is where our next component of interest comes in.

Reward

There are a functionally infinite number of things you could be doing at any given moment of your life. You have hundreds of muscles you can use to affect the world around you, each of which you can contract and release in an astronomically large number of combinations every fraction of a second. How does your brain decide which of them to choose? We don’t just take random actions at every moment and hope for the best. We take deliberate actions to achieve goals. This is the hallmark of intelligence.

rat

Where do these goals come from?  The answer comes in part from specialized regions of the brain that produce reward signals. These signals tell the rest of the brain how “good” or “bad” your current state is. We choose actions that take us to states yielding positive rewards, and away from states yielding negative rewards. Our goals—acquiring food when we’re hungry, running from predators when threatened, etc.—derive from this principle.

But this begs the question. Where do rewards come from? What determines what is good and what is bad? The process of evolution provides the bedrock here. Darwinian natural selection has spent hundreds of millions of years tuning the innate reward-generating circuits in brains to direct organisms’ behavior toward states that increase their likelihood of reproducing. Reproducing of course requires staying alive, and so being well-fed, not thirsty, and safe from predators are highly-rewarding states. Being in the good graces of attractive mates also increases an organism’s likelihood of passing on its genes, hence all of the positive rewards associated with sex.

Given these innate “ground truth” signals provided by evolution about goodness and badness, biological reinforcement learning systems learn, through trial and error, which actions to take so as to maximize the sum of the rewards they expect to receive in the future. This is the primary objective of a reinforcement learning system—accumulate the most reward possible over time by taking appropriate actions.

This is actually quite a difficult problem because rewards don’t often come immediately or reliably after taking a single action in a given state. Often very long sequences of actions must be taken, passing through many states, before any appreciable reward is received. A reinforcement learning system must figure out which of the actions it took along a path through a set of states were responsible for the rewards it received along that path—a problem known as credit assignment.  There are well-established mathematical models of how this problem might be solved in biological systems, with evidence from experimental neuroscience to support them. The field of computational reinforcement learning, an area of active research for the past several decades, explores the implementation of these models in artificial systems such as robots.

Since evolution is not the guiding process when designing artificial systems in this paradigm, a critical question arises: where should the rewards in such AI systems come from? That’s the 64-trillion dollar question we must answer in the coming decades. Extracting the maximum economic value from truly intelligent robotic systems will center around solving what is known as the value alignment problem. In short, we must design AI systems such that their reward functions encourage behavior that is beneficial to humans. What “beneficial” means precisely and how to account for complex tradeoffs when these systems must interact with humans are open questions in the field of AI. These issues are too complex to address here, but there are many ideas about how we can begin to tackle this problem. The only thing most researchers agree on right now is that we had better get this right.

At OSARO, the tasks our robots perform are constrained to execute in industrial environments with strict safety protocols and little direct human interaction. As such, the reward functions we employ can be fairly simple. I’ll explain in more detail how OSARO’s software incorporates the framework I’ve been outlining thus far in a moment. But first, let’s finish our journey through OSARO’s name, as we have now come to the final component of interest.

Observation

I know, I know. We covered this already. But this section is actually about something just as important as observations themselves. Something so important to our reality that Einstein spent much of his life contemplating it: time. Recall that the components I’ve been describing comprise each step in a sequence that spans your entire life; indeed, that sequence is your life. The steps in this sequence occur in your brain hundreds of times per second. Observation, state, action, reward, observation, state, action, reward, observation, state, action reward… you get the picture. Listing observation again here emphasizes the central role that time plays in reinforcement learning.

To understand the world, we must learn representations that both summarize the history of our observations and allow us to make predictions about future observations, given some sequence of actions we intend to take. We must also correlate these sequences of action with the states and rewards they yield so that we will know how to behave in the future, and these correlations are learned through trial and error. All of these requirements for intelligent behavior are inextricably linked to the passage of time. Reinforcement learning is inherently a process.

Even state representations themselves often require a notion of time. Single observations are generally not sufficient to disambiguate what state you’re in at any given moment. In general we must experience several observations in sequence in order to understand what’s happening around us. Take the concept of object permanence, for example. This is simply the idea that an object in your visual field that moves out of view (e.g., behind another object) doesn’t cease to exist. Rather, your brain integrates the history of observations received when the object was in view, and uses that history to maintain the position of the object in your model of the world even after your observations no longer contain any data relating to it (i.e., when it’s no longer visible). This seems like quite an obvious concept, but it’s something that humans don’t learn until they are around 6 months old. When you think about it, that’s a pretty long time to think that your mom ceases to exist every time she plays peek-a-boo with you.

Environments in which a single observation is sufficient to tell you what state you’re in are said to have what is called the Markov property. This means that you can define your state representation to be your current observation without fear of misclassifying your true state. Reinforcement learning has been successfully applied to environments with the Markov property for decades now, in particular to games like Chess and Go. A fairly recent success was DeepMind’s AlphaZero, which used reinforcement learning to achieve super-human performance in Chess, Go, and Shogi, all in a matter of hours. While this was indeed an impressive result in the reinforcement learning community, the fact that these environments are Markovian certainly helped a lot.

The real-world is decidedly non-Markovian. Take a look in front of you. Are your house keys in sight? If not, consider that they might be in your pocket, or in your desk drawer, or under the couch cushions. All of these possibilities might be valid states of your world right now, but they are all consistent with your current observations because your keys are not currently in view. You may remember where your keys are, but that’s because of history—previous observations you experienced that showed your keys to be in your pocket, or desk, or elsewhere. Time is an integral part of how we perceive the world and act in it, and the letters bookending OSARO’s name pay homage to this fact. 

OSARO

So now you know what OSARO means, and that each of its letters stands for a critical component of a framework for learning and decision-making in biological and artificial intelligent systems. A big challenge for artificial reinforcement learning systems over the past couple of decades has been to apply them successfully to real-world problems—in particular robotic manipulation tasks. This is a challenge we deal with daily at OSARO. One of OSARO’s key solutions, our piece-picking system, learns to pick objects from a source bin full of consumer products and place them into destination bins that then get packed into boxes and shipped. Check out the clip below for an example of our piece picking system doing its thing.

robot

How does OSARO apply the reinforcement learning framework to the industrial automation tasks we solve for our customers? We can use our piece-picking solution as an example and take things one component at a time again to illustrate.

Observation

Our software runs on various robotic platforms, but each provides our system with streams of information from which to learn. In the case of piece-picking, every robot cell has one or more cameras that provide the system with visual observations—a view of the source and destination bins from various angles. The robot arm also provides information about the position and orientation of the arm at each moment in time, as well as the amount of force being exerted on the end-effector (the tool attached to the end of the arm that actually does the picking). We often use suction-based end-effectors (as in the clip above), which additionally provide an observation signal indicating degree of air flow. This can be used to infer whether the end-effector is currently holding an object or not. All of these observation signals generate quite a lot of data for our algorithms to process. Deciding where and how to pick or place an object requires aggregating them into a compact, useful state representation.

State

Our software leverages state-of-the-art machine learning techniques to go from streams of observation data to meaningful, stable descriptors of the system’s state. For instance, images of the pick and place bins are passed through neural networks that learn to identify where the source and destination bins are in 3-dimensional space so that the system knows where it can move the arm to pick and place objects without colliding with the bins.  Other neural networks are trained on images of the contents of the bins to learn things like how full the bins are, where the relevant objects that may need to be picked are located, and exactly where on those objects would be the most effective place for the robot to attempt a pick. These components of the state representation are aggregated together and used by the system to make decisions about where and how to pick and place objects safely and efficiently.

Action

The system has many actions to choose from for any given pick request. These include selecting which object in the bin to grasp, which end-effector tool will be the most effective, the angle at which to pick up the object, and how close to get to the object during approach before slowing down to avoid damaging it. In other tasks, where precise placement position and orientation are also required, the system has even more actions to choose from during the placement motion, such as where to place the object and at what orientation, what height to release it from, and how fast to move when carrying it. The number of possible actions is too large to program a simple set of rules for selecting among them—hence the need to learn appropriate behavior via trial and error.

Reward

We guide the system to learn to behave as we would like it to by defining its reward function. The system receives positive rewards for successful picks, which occur when the target object is picked up and placed successfully in the destination bin without being damaged. Negative rewards are given for failures like unsuccessful pick attempts, dropping an object outside of a bin, picking up more than one object, or damaging an object. The system also receives more reward for doing its job faster, and is thus incentivized to pick as fast as possible without damaging or dropping objects.

Observation

Recall that learning and decision-making in our system is an ongoing process, unfolding over time. Each attempt at a pick provides a new opportunity for the system to learn about the effects of its actions. When it succeeds, it adjusts its behavior to perform similarly in similar situations. When it fails, it tries something different the next time it’s in a similar situation. The system initially fails often, but improves its speed and accuracy as it continues to attempt picks and stumbles across good solutions that it remembers for the future.  This is the primary advantage of a learning-based system—as it performs its duties it continues to adapt and improve.


Hopefully this outline gives you a clear sense of how OSARO applies the reinforcement learning paradigm to real-world robotic systems. OSARO has deployed multiple instances of our piece-picking solution around the world, all of which are continuously collecting new data and improving the system as a whole. We are also actively working on adapting our software to solve more complex automation problems, including several challenging industrial assembly tasks, such as automotive assembly.

It’s an exciting time to be working on reinforcement learning systems that solve real problems facing many manufacturers and distributors around the world. These systems can be trained to perform repetitive yet difficult manipulation tasks efficiently and inexpensively. Increasingly many countries are encountering shortages of reliable human labor for such tasks, threatening critical supply chains and economic stability. The recent devastating disruption to the global supply chain caused by the COVID-19 pandemic highlights the immense potential for intelligent robotic systems to bolster the stability of logistics infrastructure around the world, and consequently the global economy. OSARO is proud to be at the forefront of advancing the capabilities of these systems to help realize this potential to improve the standard of living for billions of people.

The Strange Future of Digital Media

Two months ago, I started work on a side project experimenting with artificial speech synthesis using recently published machine learning methods. The pace of advancement in speech quality produced by neural network models following the deep learning revolution in 2012 was impressive, and with the release of the WaveNet paper by Google’s DeepMind in 2016 it was clear that the uncanny valley had been crossed. It was now possible to generate artificial speech indistinguishable from human speech, as judged by native speakers.

I was amazed by the quality demonstrated in the results of the WaveNet paper when it was published, and began casually following the literature, though I wasn’t actively contributing to it. I was interested in the application of speech synthesis to the creative industry — in particular, voice generation for games and films. I grew up a media junkie, and though my academic and professional career wound up taking an engineering path, I have remained passionate about artistic storytelling my whole life.

I decided to explore the space from an entrepreneurial perspective to see what markets might exist around speech synthesis models as content creation tools. I delved a bit deeper into the literature involving the state-of-the-art models and decided that a good way to learn about how these models worked would be to glue together a handful of existing open-source components to produce a realistic text-to-speech demonstration of a highly recognizable voice. I experimented with a few voices, eventually settling on psychologist, author, and podcaster Jordan Peterson for a few reasons, not the least of which being that I am a fan of his work and style of intellectual discourse, even though I don’t agree with all of his viewpoints.

I started working on the project part-time at the end of June. After about four weeks of learning how the components worked, experimenting with data processing techniques, and training a few different models, I had a pretty convincing speech generator. The next step was to build a tech demo that would let users generate audio from the model using a simple front end — type and listen.

Building the site was actually harder for me than developing the speech model. I needed to learn a few technologies, having never built a website myself before. Getting the site running smoothly took another month. I mention this to emphasize that creating this model was not difficult. All of the technology was readily available. If I can build it in a few weeks, there are surely many engineers more talented than I that could do it better faster. We must decide as a society how we want to deal with this fact. More on this later.

An Unexpected Reception

On August 14th, I posted a link to the site (notjordanpeterson.com) on Reddit. By the next day, there were over three thousand active users; by the second day, over ten thousand. I hadn’t expected such demand so quickly, and had to spin up a couple more servers to handle the traffic, though even this wasn’t enough. Wait times to synthesize audio continued to climb each day, reaching as high as 20 minutes to generate 20 seconds of audio, though people didn’t seem to mind the wait. I received almost no complaints — only positive feedback about the site’s entertainment value.

On August 16th, TNW wrote an article about the site, touting its “eerily realistic” quality, which drove more users to the site. On August 17th, a YouTube video on The Thinkery’s channel was posted, describing the site and the uncanny resemblance of the generated audio to Dr. Peterson’s real voice. The video also raised potential concerns over technologies like this.

I found most of the praise for the quality of the model to be a bit overblown. The fidelity of the generated speech is certainly convincing (a testament to the caliber of the models’ inventors, not any skill on my part), but not indistinguishable from the real thing. In particular, the frequent mispronunciations and general lack of emotional affect make it obvious that the speech is synthetic. I liken the model’s output to Dr. Peterson speaking into a cheap microphone while on a low dose of propofol. It is undoubtedly possible, however, to generate short clips that would fool someone who wasn’t listening for a distinction. Models like this will only improve in the coming months, so indistinguishability from an arbitrary person’s voice is around the corner.

In addition to the articles and videos posted about the site during the week it was up, I received numerous messages from users of the site, almost all of which were positive, some in interesting and unexpected ways. Most people talked about using the site to send friendly jibes to their friends, something I did with my own friends while testing it out, to the glee of the recipients. Others — long-time fans of Dr. Peterson — talked about sending encouraging messages to family in the voice of someone they greatly admired, and how well-received those messages were. Still others discussed generating inspirational quotes from Dr. Peterson addressed directly to them. One person even described how much this had improved his mental health in just a few days and suggested I consider pursuing a partnership with Dr. Peterson to develop a personalized mindfulness app.

To my surprise, Dr. Peterson himself tweeted a link to the Thinkery video not long after its release, with only the pithy commentary “not good”. I interpreted this to mean “not good for society”, though it was difficult to tell whether the topic was only of passing interest to him, or represented something of deeper concern. This became clear on August 22nd, when Dr. Peterson published a blog post about “deep fake” technologies, citing my website and a few of its YouTube forebears. He posed several relevant and important questions we collectively need to address in the coming years, all of which I and many others have been thinking about as these technologies evolve. What I did not expect was his conclusion, in which he called for “the Deep Fake artists … to be stopped, using whatever legal means are necessary, as soon as possible,” also suggesting that this type of content should be a felony offense.

This seemed to me an overreaction. However, out of respect for Dr. Peterson, and in the spirit of having a rational discussion about the implications of this technology, I suspended the functionality of the site for the time being. The remainder of this post is my attempt to articulate some of the complicated issues the existence of this technology raises, and some ways we might go about mitigating its misuse while maximizing its potential for good.

To Regulate or Not to Regulate

Despite the numerous positive examples, there were certainly negative uses of the site, with some people entering offensive text and posting the resulting audio. I need not repeat the examples here, as I’m sure you can imagine them. I have no doubt that Dr. Peterson encountered several of these negative examples from media coverage of the site, which surely colored his initial reaction, though I would be remiss if I didn’t point out that the mainstream media hasn’t needed any machine learning technology thus far to put words in Dr. Peterson’s mouth.

Assigning blame for inappropriate uses of technology to the technology itself is counterproductive, however. A natural reaction to novel technology is to immediately use it toward lewd or immature ends. Recall that two of the top-selling apps in Apple’s App Store immediately after its launch in 2008 were iFart and iBeer — and that was on a curated platform. This inclination comes from facility. It’s easy to generate banal content like this, and it receives attention in the short term. Despite this tendency, however, the novelty wears off quickly and more creative and productive uses begin to emerge.

Nevertheless, since the media invariably focuses on the negative aspects of new technologies, the subject of regulation looms large. While there is often need for regulation of some kind when addressing global issues (free markets don’t solve all problems), we also need to act judiciously when considering how to regulate emerging technologies. We need to solve the core problems, and not throw the baby out with the bathwater.

It should be obvious that machine learning isn’t the first technology to have both benign and nefarious implications. It’s difficult to think of any technology that doesn’t exhibit this dichotomy. Fire, guns, the printing press, computers, the internet. Even something as seemingly innocuous as an online social network, where you can share photos and stories with friends, has landed us in a heap of trouble, spurring congressional oversight. How many people in 2004 predicted that?

Whatever the potential perils, the best answer to the regulation of new technology is rarely to strangle it in its crib. This in general breeds black markets and an overabundance of misuse, followed by costly enforcement policies. In cases where the technology is trivially easy to distribute, as it is with software, these effects are amplified. Digital piracy comes immediately to mind. The answer to Napster and its file-sharing descendants wasn’t litigation from the RIAA. It was iTunes. And later Spotify. New business models. New technologies. Positive sum solutions. Swift and untempered regulation is often a losing battle, if not an incredibly protracted and expensive one. We need to find smarter ways to adapt.

In the specific case of machine learning algorithms that can mimic the likeness of real people (especially public figures), it’s instructive to analyze existing modes of such mimicry to see what, if anything, is different about these new technologies. It is fairly straightforward to produce a convincing photo of Dr. Peterson riding bare-chested on a horse, à la Putin. This has been possible for years. Yet no one is currently flustered by this because people have become accustomed to the existence of Photoshop, and they have learned to be more skeptical of digital images. It will be the same with machine-learning generated audio, after an adjustment period, and eventually video as well. Regulating the technology itself would be akin to holding Adobe accountable for an internet troll’s Peterson/Putin mashup photo.

Consider also impersonations and parody, sans machine learning. If we collectively believe that impersonation in digital media without intent to commit fraud, but rather for the sake of parody or artistic expression should be illegal, why have we not shut down Saturday Night Live (SNL), or at least forbidden it to portray any public figures in its sketches? Such content is currently protected under freedom of speech and parody laws, and for good reason. Are we to shut down parody when we don’t like its content?

What is different about using machine learning technologies to create such content? There does indeed seem to be an intuitive distinction at first glance, but examining more closely may yield no differences. I believe the reasons for this intuition are twofold — accessibility and fidelity.

One characteristic of software is the ease with which it can be transmitted and used — its accessibility. Once a solution is found for a problem through software, it is almost immediately usable by anyone with an internet connection. Obviously this has both pros and cons. If the software is of net benefit to society, it means we get better faster. If it’s malicious, it is very difficult to correct for.

So what about content creation software? If nearly everyone suddenly has the ability to make a convincing video parodying a public figure in a matter of minutes, is that good or bad? Not an easy question to answer. One thing is for sure though — the sheer volume of content that will emerge will result in a redistribution of viewers’ attention from a select few sources of content to a much larger pool, albeit with much higher variability in quality. We have already seen this with the rise of YouTube and streaming content services siphoning viewers’ attention from traditional media networks over the past decade.

As more content of higher caliber becomes available from a wider array of content creators, the public’s attention will necessarily be reapportioned. However, the more clever and thought-provoking creators will inevitably rise to the top, in one of those hierarchies Dr. Peterson regularly lectures about. While accessibility certainly makes any potential misuses more conspicuous, does it change the nature of what is being produced? Parody is parody. If the nature and purpose of the content isn’t changing, should accessibility change the nature of legislation surrounding it? Asking for a friend.

Fidelity is perhaps the more salient reason for people’s unease with this technology. When Alec Baldwin parodies Donald Trump on SNL, it is hilarious, and it is also immediately obvious that he is not actually Donald Trump. What if it weren’t obvious? Aside from the fact that it would probably make the sketch less funny, does it change the nature of the content? Is it no longer parody? We are reaching the point at which fidelity of generated content is high enough that this question needs to be answered, at least for audio. If the answer is that it is no longer parody, but fraud, this seems tantamount to claiming that parody constitutes fair use of someone’s likeness unless you do it really well. This should raise some eyebrows.

Bad Actors

As for criminal misuse of this technology — that is, genuine fraud with intent to deceive — these cases should be handled as they always have been under our judicial system. The responsibility should lie with the generator of the content based on their use of it and their intent. It makes sense to police inappropriate use of the content generated by this technology, but not the technology itself. With the invention of the phone came the prank call, and later the fraudulent call. The solution wasn’t to eliminate telephony. Pushing this technology underground can’t be the solution to ensuring that the media we view is authentic.

So what is the solution? Given that the advancement of this technology is inevitable, what are some ways we can protect ourselves against bad actors leveraging its accessibility and fidelity? The root problem that must be solved is source verification. We trust information from sources that have built trust with us. Reputation is paramount. What has changed as a result of technological progress is the ability to counterfeit information — to make it seem as if it were issued by a trusted party, when in fact it was not.

Surely this problem sounds familiar. And surely the solutions we have already developed are equally familiar. Currency exchange has been particularly susceptible to shifts in technology, yet with each shift we quickly find ways to handle bad actors. For centuries, nations have spent tremendous effort to combat counterfeiting of physical currencies. With the arrival of credit cards came credit card fraud, and entire divisions and agencies for fighting it. With the rise of the internet, and the prospect of exchanging currency digitally, came the entire ecosphere of secure digital transactions, backed by cryptographic methods. Never did we consider regressing to a cash-only society.

The solutions to protecting our money have also been readily used to protect our personal information. When you visit a website, your browser tells you whether the site is secure — that the information you are sending will be encrypted, and that the recipient of the data is actually who it claims to be, as verified by a trusted third party. Future developments in blockchain technology will render the trusted third party unnecessary, but the principle is the same. Cryptographic methods will continue to be the solution to source verification when transmitting information electronically.

Over the next few years, I see no way around moving to a communication model in which we cryptographically sign digital media meant to be sources of truth. If a video claims to be from the White House, it will be cryptographically signed by the White House, and there will be software to verify that. If an audio clip claims to be from Jordan Peterson, it will be signed using Dr. Peterson’s private key. If a media clip claims to be from a trusted source but doesn’t have a valid signature for that source, your media player will tell you that, and you can choose to ignore it accordingly.

Whatever the details of how we implement these verification systems, the fundamental principle is straightforward and essential: we should treat any media we intend to consider a source of truth the same way we treat our money and personal information. Everything else should be considered entertainment, or require further verification from multiple sources.

In short, the solution to source verification is, at least in part, more technology, not litigation and suppression. We must also learn to be more skeptical of digital audio and video from untrusted sources. This opinion piece from the New York Times (coincidentally posted on the same day I released my site) shares the same sentiment. As Dr. Peterson has emphasized in many of his 12 Rules for Life lectures, the general populace is not dumb. There simply exist adjustment periods during the emergence of new technologies. We are entering one of those now. We will adapt intelligently, doomsayers be damned.

An Emerging Landscape

Assuming we do manage to adapt and avoid a post-truth dystopia, what will the landscape of content creation look like over the next few years? Within the domain of speech synthesis, it will be possible and inexpensive in the next three to five years to generate a perfect clone of someone’s voice from a few minutes or less of their speech. It will also be possible to create new synthetic voices by interpolating between existing voice models, allowing content creators to produce the full gamut of variability in human speech, including accents and intonations, in any language.

The voice acting industry will change dramatically as a result of this. A CGI movie can be made today without the use of human actors, with the exception of dialogue. With the rise of synthetic voices, films and video games will increasingly use software tools to generate the dialogue they need in much the same way that they now use graphics software tools for modeling, texturing, animation, and lighting.

Applications outside the creative industry will make extensive use of this technology as well. Call centers, digital assistants, content readers, and advertisements will all deliver highly personalized content using flawless human speech in a listener’s native language. This will also open large markets in localization — narration and dialogue will be instantaneously translatable into any language using the speaker’s native voice with an appropriate accent. Imagine Scarlett Johansson speaking perfect French, Spanish, German, and Italian in the European releases of her future movies.

To the extent that existing celebrities continue to maintain their personae in digital media, the content they produce will be increasingly machine-generated. Personalized content addressed directly to individual fans will be a staple of those stars who wish to keep up with this changing landscape. Machine learning technology will make this possible at greater scale without requiring any of a celebrity’s time. Those who resist this technology rather than embrace it will likely miss out on opportunities for new revenue streams.

More intriguing applications of voice synthesis technology include preserving cherished voices in today’s media, as well as resurrecting the voices of beloved celebrities that have passed on. Consider the joy of having David Attenborough and Morgan Freeman narrate our documentaries for another 100 years, and listening to future newscasts delivered by Walter Cronkite. Whether these scenarios come to pass remains to be seen, but it is certain that the technology will be available to achieve them in the near future.

The potential applications of this technology to consumers’ personal lives are numerous and thought-provoking as well. As an example, one user of my site who messaged me to compliment me on my work added that he couldn’t imagine how much his mom would be willing to pay to hear personalized messages spoken to her by her deceased husband. The “living portraits” of remembered loved ones as described in the Harry Potter novels will not be relegated to the world of fantasy for much longer. Does this concept tug at your heart strings, or sound like an episode of Black Mirror? Perhaps both? Progress in machine-generated media will raise increasingly many unusual questions like this in the coming years.

Moving Forward

I am currently in the process of starting a company to build next-generation content creation tools for storytellers. Our mission is to empower everyone to tell the best versions of their stories possible by leveraging machine learning to reduce the barrier to entry for creating professional-quality media. The experience I’ve gained building this prototype and witnessing people’s reactions to it has been invaluable. While the business models I am exploring don’t revolve around the use of well-known personalities, I still believe the issues discussed above are critical for us to address intelligently as a society. We must find a way to maintain the protections of free speech and parody while minimizing the potential harm from bad actors.

I wrote this post with the hope of stimulating further discussion about the implications of machine-generated content. I look forward to hearing from others who are thinking deeply about the issues I’ve addressed here, and learning from their perspectives.

Self-Reflected

I’m a huge fan of the intersection of science, technology, and art—where the distinguishing traits of humanity come together to produce some of the most awe-inspiring creations in our known universe. A couple of years ago I discovered an inspiring piece of engineering and art which aims to visualize the complexity and elegance of the human brain and the beautifully choreographed ballet of information that continuously travels through its billions of neurons as you experience each moment of your life.

Created by neuroscientist and artist Greg Dunn, the piece, titled Self-Reflected, struck all the right chords for my tastes and interests. I hemmed and hawed about buying it for over a year before finally deciding that I would splurge for Christmas and use it as an excuse to undertake a bit of a hobby project for myself. You can find all the details you might care to know about Self-Reflected and how it was made here. The rest of this post is about my efforts to get the most out of it. If you’re not interested in the details, you can just watch the video of the final result above.

Etching

 

Self-Reflected is an artistic rendering of an oblique mid-sagittal slice of the human brain; here is an image showing the location and orientation of such a slice. The piece is physically realized as a micro-etched print, which means that a fixed light source pointed at it is reflected differentially at neighboring points very close together on the etching. This technique produces visually interesting effects even with a static light source, but is most evidently impressive when the light source is moved relative to the etching.

The movement of a light source from side to side produces an animated effect in the etching that brings the rendering to life in a surprising and visceral way, giving the appearance of electrical impulses traveling along the axons and dendrites of the neurons depicted in the etching. Varying the intensity, speed, and color of the light source produces an endless array of animations, some of which you can see in the video I recorded above.

Since the purchase of Self-Reflected includes only the etching itself, I needed to build a lighting rig to mount over it in order to realize its full potential. I’ve documented the steps I took and design choices I made when building the lighting rig and control unit here for anyone potentially interested in doing something similar.

Lighting and Power

In order to be able to enjoy the piece from a reasonable vantage point, I needed to animate a light source programmatically, rather than stand over the etching and wave a light back and forth manually (this would get tiring). I did some brief searching, asked a friend, and found that NeoPixels were a popular choice for artistic lighting projects. NeoPixels are individually addressable LED lights that can be controlled via digital micro-controllers like an Arduino or Raspberry Pi. Technically NeoPixels are AdaFruit’s brand of addressable RGB LEDs using the WS2812 drivers/protocol. They are shipped in various configurations, but most commonly as a linear strip, which is exactly what I needed.

I purchased a one meter strip of NeoPixel equivalents and started reading up on what I needed to program them. AdaFruit’s site was super helpful in figuring out what I needed and how to put everything together. They recommended powering the strip separately from the micro-controller used to control them, since the LEDs need a lot more power than the chip. I purchased a 5V 2A switching power supply to power the strip, a female DC power adapter to connect to the leads on the strip, and a 4700uF 10V capacitor to put across the terminals; the last component was recommended by AdaFruit to prevent any initial rush of current from damaging the pixels.

There are options for powering the NeoPixels via batteries, but since the rig was going to be mounted stationary over the etching I didn’t bother exploring them much. I could just leave the whole thing plugged in all the time and not worry about charging batteries, though the cables are admittedly a bit ugly.

With these parts assembled, I connected the power supply, adapter, and capacitor to the strip and plugged it in, lighting up the strip. So far so good. Now I needed to figure out how to control them.

Control

I wanted to be able to control the lighting rig from my phone, both to avoid needing to get up on a chair to push buttons on the controller and to be able to customize properties of the lights easily. I looked up some popular micro-controllers and settled on AdaFruit’s Feather Huzzah ESP8266 which is sufficiently small and has a built-in WiFi module. Once I had the Feather, I connected it to my laptop over USB and followed AdaFruit’s guide to interacting with it using the Arduino IDE. Now I needed to connect the NeoPixel strip to the Feather.

I soldered connector wires from the ground and data leads on the strip to the appropriate pins on the Feather. At this point I was able to turn both the strip and the Feather on without anything catching on fire, and they seemed to work properly. The strip still only turned on with all pixels at full white though. To make any changes to their color and brightness I needed to actually send some data to them.

The NeoPixels Arduino library is open source and lets you program a set of NeoPixels from an Arduino through a simple interface. I loaded one of the samples in the library onto the Feather through the Arduino IDE to test the full setup and things seemed to work fine. Two things left: write a program to move the lights in a pattern that best fits the purpose of Self-Reflected, and find a way to customize a few properties of this program over WiFi so I don’t need to make code changes to adjust them.

For the latter step I settled on the Blynk IoT platform, which provides a user-friendly way to create widgets in a phone app that you can tie to “virtual pins” on your Arudino by writing functions that reference Blynk’s libraries to send/receive data to/from those “pins”. Blynk is a paid service, but free for a single-user, single-device project, which is all I needed. Here’s a shot of the set of widgets I chose for the lighting controls.

The widgets let me turn the whole strip and off, turn the light chase animation on and off, set the color and brightness of the lights, the speed of the chase animation, and the width of the little Gaussian bumps that produce the animation effect when they move across the strip.

The Arduino program that animates the lights and communicates with the Blynk app is fairly simple. Here’s a gist of the code, with my network details redacted.

With the control unit working and the code written, the last step was to mount the whole rig and fine tune the settings.

Mount

I needed to mount the light strip above the etching, facing down toward the ground to get the proper effect. This required a custom mount, which I built from scrap wood and a small hinge I got from Home Depot.

I wanted the whole mount to be easily detachable from the wall to make servicing and experimenting with the light strip easy. The base of the mount is a horizontal wooden bar, which I just hook onto a couple of screws in the wall using picture mounting brackets I screwed into the back of the bar. A cross bar comes out from the base bar to put distance between the wall and the light strip. The mounting bar for the light strip is a long (4 feet) thin piece of wood slightly wider than the light strip itself, and this bar is attached to the cross bar with a small metal hinge so that I could modify the angle between the strip and vertical somewhat after construction without needing to recut anything. I stained the whole mount structure with a dark wood stain to better match with the etching frame and my furniture.

I mounted the NeoPixel strip to the cross bar using a metal casing strip designed to hold the light strip flat in place. The casing comes with a translucent strip cover that slides over the casing to smooth out the light coming from the strip and make it seem more continuous, rather than a sequence of individual LEDs.

I wanted some kind of case to put the Feather and connecting wires in so that I didn’t have to attach them directly to the wood and have loose wires hanging off of it. I found this page on AdaFruit’s site providing modular CAD models of different types of cases for the Feather, which could be 3D printed. I downloaded the parts I wanted (the Feather case with mounting tabs and the topper with header holes) and had them 3D printed by 3D Hubs for a reasonable price.

In the end, because I’m a terrible electrical engineer and not much of a handyman, the case didn’t wind up providing much in the way of cleaning up the design of the mount, but it’s better than nothing. There are still wires sticking out un-aesthetically, but they’re not really visible from below when it’s mounted above the etching. Things aren’t perfectly straight either, but I’m calling diminishing returns on spending more time on it. Here are a couple of photos of the final (janky) version of the mount (yes there was duct tape involved).

End Result

After sneaking an hour or two here and there every couple of weeks since Christmas working all of these steps out, I finally finished the damned thing. Or at least I’ve put as much time into it as I care to. The video at the top of the post gives you a sense of the piece as it was meant to be viewed (I hope). Below is a photo of the final result mounted over the etching (yes I know it’s a little crooked; diminishing returns). I learned a few things working on this, but mostly I’m happy that I now have an animated brain in my bedroom.

SelfReflectedAndMount

 

Think Slower

tl;dr: Read this book, even if it’s the only one you ever read.

Earlier this year I read The Undoing Project by Michael Lewis (author of Moneyball and The Big Short), an account of the unique relationship between the author of the above book, Daniel Kahneman, and his colleague Amos Tversky. It details their collaboration in redefining theories of decision making and behavioral economics in the 70s and 80s. Thinking Fast and Slow, which I read last year, is an excellent compendium of Kahneman and Tversky’s research, and I think it should be required reading in high school.

The short of Thinking Fast and Slow is that most of the decisions you make, big or small, you don’t make for the reasons you think you made them, and this property of human behavior is a consequence of the way our brains are wired. We have “many brains” in our heads, or rather many subsystems in our brains, each vying for control over our behavior. At the most abstract level (the level at which the book makes its primary distinction) there are two main subsystems that operate in parallel. When the decisions these subsystems make are in conflict, one decision must win out over the other, since we only have one body to control. More often than not the “right” decision is made based on the context—our brains wouldn’t be much good if they were wrong most of the time. But often, especially for modern humans, brains make decisions that seem like the best option to our conscious minds, but are actually suboptimal or detrimental, either immediately or down the road.

Our brains work this way because of how they came to be. Evolution is a necessarily greedy algorithm. It can’t go back to the drawing board when it realizes that a major restructuring would produce a much better outcome, indeed because it can’t have such a realization. It can only make small changes to existing solutions, either modifying a piece of what’s already there slightly, or adding something new on top of it. Of course these small changes accumulate over time to produce an incredibly diverse array of creations, which is what makes it such a powerful algorithm.  When it comes to brains, this greedy process necessitates designing new modes of behavior on top of all the existing modes. The result is a cacophony of voices constantly shouting their orders, with the loudest voices at any given time winning control over the muscles. Marvin Minsky called this The Society of Mind, though there are countless theories and interpretations of this principle in psychology, neuroscience, cognitive science, and artificial intelligence.

What this means for the way we behave, unfortunately, is a whole lot of inner conflict, both conscious and subconscious. The reflexes and impulses that are excellent at catching flies to eat and running away from murderous predators aren’t sensible solutions to complex logical problems that require weighing alternatives from multiple, very deep branches of a possibility space. Yet the parts of our brains that evolved the ability to solve the more complex problems had to be bootstrapped from the older ones that solved the simpler problems. Since the older parts don’t always get kicked to the curb as the new ones come online, all of the parts cast votes for moving our arms and legs and tongues every second of our lives. What makes humans special is that our brains evolved enough new technology to recognize this fact and have it significantly influence the voting process. We can stop, reflect, and invalidate the votes of the older parts of the brain in some cases. This doesn’t come naturally though. It has to be learned and practiced.

Acknowledging this fact and adjusting our behavior accordingly is one of the most important things humans can learn to do, and why the concepts in this book are so important. No one will ever be able to completely overcome the biases built into our brains or the way we learn and perceive our world; that’s a biological impossibility. In the coming decades we will likely design machines that are better at this than we are, or perhaps augment our brains with machinery that makes this feasible. But for now, just recognizing that these biases exist and taking the extra few seconds or minutes to think more objectively through important decisions (even small ones), can have a profound impact on our lives for the better.

Unfortunately the very neural structures that allow us to think slowly and deliberately about complex problems in this way have provided us the means to invent technology that reinforces exactly the opposite behavior. Our current ability to communicate instantly with anyone and everyone, anywhere, at anytime has produced a culture of sound bites, instant gratification, and 140 character summaries of topics that should take pages to explain properly. The deluge of information we receive daily precludes taking the time to understand it properly. We form opinions instantly based on very little information and tout it as fact, and many are proud of their “talent” for making these quick decisions, never doubting their (often low) accuracy.

This type of thinking is epitomized, personified, and glorified by our current president, who reasons almost exclusively using what Kahneman calls System 1—the subconscious, subjective, reactive, quick-acting, emotion-driven decision-making system governed primarily by the evolutionarily older parts of the brain; the fly-catching, predator-escaping, sex-obsessed parts. This is not meant to be a political post. I only use Trump to make the following example. As soon as you read the words “our current president”, you immediately formed a subconscious (and subsequently conscious) opinion about this post. If you lean left, it was likely to some extent a “fuck yes” feeling that resulted in some shade of agreement. If you lean right, or for some other reason are a Trump supporter, it was likely a subconscious eye-roll or middle finger which blossomed into a “this is pretentious bullshit” conclusion that you feel is entirely justified by the fact that I wear gauges and live in San Francisco. The point is, you likely determined your interest in reading the above book based on this reaction, when it in fact it should have little to no bearing on that decision.

The initial subconscious reactions that led to this conclusion were unavoidable. System 1 is always running. You can’t turn it off. You can only override it. My choice of the word “likely” instead of “definitely” in the previous paragraph was made by my System 2—the slow-moving, deliberative, cautious, uncertain, logical, and statistically-aware parts of my brain. If I were generating this post off-the-cuff (or under the influence), my System 1 would have produced something like “All Trump supporters are ignorant System 1 zombies that have no fucking idea what they’re talking about”. This is the immediate, visceral reaction that happens in my brain when I see his name because of the associations with him I have built up over time, and the kind of thing you see on most internet comments. That immediate reaction is unavoidable (barring deliberate, long-term reconditioning). But it would be horrifically irrational for me to let those parts of my brain control my fingers while typing this, just as it would be horrifically irrational of me to grab the crotch of someone I find attractive, but who hasn’t given me permission to do so.

All Trump supporters are not Trump. It is irrational to equate the two and their ideologies without knowing more information about each person individually. Of course it is generally prohibitive to acquire that amount of information, which is exactly why System 1 exists, and why it evolved before System 2. System 1 operates on heuristics—general rules of thumb that are more or less true more often than not. Heuristics (e.g., stereotypes) are extremely useful when high-stakes decisions must be made in seconds or less. These rules mean the difference between life and death for nearly every animal on the planet, but not for most modern humans. Yet most modern humans still use System 1 to make their high-stakes decisions, even though there is plenty of time to let System 2 do its thing.

Part of this has to do with culture. In America at least there seems to be a bizarre marriage of two diametrically opposed attitudes toward decision-making: anti-intellectualism and fear of appearing ignorant. Mainstream media often paints rational thinkers, scientists, and scholars as bookish, intellectual elites that sit in ivory towers in lab coats and disseminate indisputable facts; a separate portion of society from which we obtain some information needed to set policy, but which doesn’t know anything about living in the “real world”. It would require a much longer post to list all of the reasons why this is completely ridiculous. At the same time, it also paints anyone that hesitates in their explanation of complex topics, or provides probabilistic answers conditional on further information or study as incompetent, unconvincing, and wrong. The direct outcome of this is that many people speak with extreme confidence on matters about which they have spent very little, if any time contemplating because they are afraid to say “I don’t know”. Yet they also can’t be bothered to spend the time to understand the issues better because they’re “not a scientist”.

Getting past this barrier is a matter of education. People need to understand not only basic probability and statistics, but all the ways in which their brains conspire against them to subvert the laws of basic probability and statistics. This is precisely what Thinking Fast and Slow attempts to do. Only through understanding how their brain functions can people recognize when System 1 is making their decisions for them and instead take the time to think slower and engage System 2. Hint: it’s pretty often.

Do I think that introducing the principles of this book into high school curricula will produce a significant difference in the behavior of subsequent generations? I don’t know. But I’ve got a good feeling about it. So maybe we should think (slowly) on it.

Ch-ch-changes

A lot can happen in 8 months. Shortly after my last post on the day I left HitPoint I was presented with an opportunity to join a new startup in San Francisco working on deep reinforcement learning. Seeing as how I had just left HitPoint to finish up my PhD in reinforcement learning, and was a bit anxious to get out of western MA after so many years there, it wasn’t something I could easily pass up, despite the fact that it would mean having a full time job again while trying to find time to finish my dissertation. And so, after receiving a job offer, I packed up and moved out to SF in June.

I’ve liked living in San Francisco a lot so far. The weather is probably the biggest plus for me, with the metropolitan culture a close second. It’s great to be able to walk to most of the places I need to go in under 20 minutes year round without needing a winter coat or profusely sweating. The cost of living is definitely the worst aspect. My rent more than quadrupled moving out here, but I did manage to find a nice one-bedroom in a high rise just a five minute walk from my new office.

My new position is Research Engineer at Osaro, Inc. We’re developing deep reinforcement learning tech that we plan to apply to difficult real-world problems (e.g., robotics), so that our clients can reap the benefits of recent breakthroughs in machine learning. Our solutions are in the same spirit as some of the work being done at Google’s DeepMind, with notable differences that I’m not currently at liberty to divulge. 🙂 I’m very excited to be a part of this team, and I think we’ll be making big waves in the machine learning and robotics community in the next couple of years. It’s great to be back in the machine learning game and making use of all the knowledge I gained during my doctoral research.

Speaking of which, as expected, having a full time job immediately after leaving my previous one didn’t do much to help with finishing up my dissertation. Although I didn’t finish up this summer as I had hoped, I’m happy to say that just last week I successfully defended my thesis and can now legitimately ask to be called Dr. Vigorito. It’s a great feeling to have that accomplishment under my belt, and even greater to be able to move on to new and exciting things. It was a long time coming, especially given my five year hiatus, but it’s finally done.

So yea. 2015. New job. New city. New degree. Lots of changes. I’m looking forward to all of the exciting changes in 2016.

Onwards and Upwards

It is with numerous mixed emotions that I end today, my last day of work at HitPoint Studios. It’s been a pretty wild ride for the last five years, and I truly appreciate everything I’ve learned and accomplished during my tenure there. Being a member and leader of such a great team has given me so many skills and experiences that I’ll carry with me for the rest of my life. I wholeheartedly appreciate all of the insanely hard work and assistance every member of the HitPoint team has put in over the years, and I hope to stay in touch with all of them.

I am leaving HitPoint to spend the next few months finishing up my PhD at UMass Amherst, which has long been languishing in the background of my psyche, and then moving on to new and exciting things yet to be determined. It’s a bit of an uncertain time for me, but I look forward to the challenges that uncertainty presents.

To all the HitPointers on the kick-ass team I’m leaving behind, I have no doubt you will continue to kick ass and turn out great games for tons of eager fans. Best of luck to you all!

Stay healthy. Stay hungry. Stay in touch. Ciao!

New Year, New HitPoint

The wheels are in motion! 2014 was a bit of a roller-coaster for HitPoint. We hit some pretty awesome goals, but faced our fair share of challenges and hardships as well. The upheaval at Microsoft earlier this year, particularly in their gaming division, hit us hard and was a major impetus for our decision to seek new VC funding. Thanks to members of the Western MA investor network River Valley Investors and the Springfield Venture Fund created by MassMutual, as well as other generous investors, we were able to secure enough funding to finish off the majority of our work-for-hire projects and focus the team exclusively on HitPoint-owned games through early 2015. This is both a liberating and challenging position for us to be in, but we’re embracing it with full enthusiasm and determination.

Our Facebook games are doing well and part of the team is heads down on making sure those experiences continue to be enjoyable to our players. If you haven’t checked out Seaside Hideaway, Jane Austen Unbound, Relic Quest, or Fablewood, the latter of which we recently released on iPad, give them a go and let us know what you think.

We have also licensed a mobile game that we developed for DeNA called Hell Marys, which we will be re-releasing this January. The game will be available on Android and iOS devices. The content is definitely a departure from most of the other game genres we’ve worked on, but it has been a great experience for the team to work on something different. While it’s definitely a game for mature audiences only, we’re excited to be tackling new territory and giving it the HitPoint level of polish it deserves. Look out for another post in January when we launch.

The most exciting part of our new focus, which the majority of the team has been working on the past few months, is our “next big thing”.  More details will be forthcoming in February when we announce the new product at Casual Connect Europe in Amsterdam, but for now suffice it to say it’s a very ambitious project we think is going to make a big splash in the mobile space next year. We’re all thrilled to be working on it and are doing our best to make it a fantastic user experience.

Finally, with the new year and our new direction come two other big changes for HitPoint. The first is our new logo, which is changing for the first time since HitPoint was founded in 2008. Check out the new hotness in this post’s featured image above, courtesy of one of HitPoint’s amazing artists Steve Forde. Our new branding really captures the playfulness and accessibility of our games while still paying homage to our old-school gaming roots, and we’re really happy for it to be a new way to get the HitPoint name out to the world.

The second major change is our relocation (again!). After just over a year and a half in our current space in Amherst, which we’re all quite fond of, we’ll be moving to a new office space located in the heart of Springfield’s financial district at One Financial Plaza. It’s a fantastic space with an amazing view that you can see somewhat in the photos below, which show our new suite as it is right now, under construction. What’s even more amazing is that all of this construction is happening over the course of just a couple of weeks, and we’re all set to move in this weekend! Thanks to our investors at RVI and MassMutual for making this a reality. Everything is happening very fast, but being able to start the new year in this awesome new space is a great way for us to kick off 2015, which we think will be a huge year for HitPoint. Stay tuned for more announcements as we bring all of these threads together next year to make some amazing products.

Under Construction

Under Construction

 

 

 

Fablewood is now on iPad!

HitPoint’s most recent independent game Fablewood is now available on iPad! Previously free to play only on Facebook, Fablewood is a hidden object game in which you banish evil from famous fairy tale stories you know and love, and build up your own enchanted forest. Now you can play for free on your iPad.

Our team has done an amazing job getting the game running smoothly on Apple’s tablet, and it looks fantastic on devices with retina displays. Fablewood is available on iPad 2 and higher, and requires iOS 7 or higher. You can check out some screen shots below.

If you’re already playing on Facebook, or want to play on both your iPad and your computer, Fablewood will keep all of your game data synced between your devices. Just sign into the app with Facebook on your iPad!

So if you’ve got an iPad, grab the game from the App Store and try it out. If you like it, we’d be super appreciative if you gave it a good review! If you don’t have an iPad, you can still play on Facebook, and visit our community page to leave your comments or like Fablewood. We hope you enjoy it!

Fablewood iOS Screenshot

Fablewood iOS Screenshot

Fablewood iOS Screenshot

Fablewood iOS Screenshot

 

Fun with GarageBand

I had U2’s “Running to Stand Still” stuck in my head for a couple of days. Last night I did this to try to get it out, courtesy of an instrumental version from karaoke-version.com and some messing around in GarageBand. Apparently adding a little reverb to your voice can make you sound halfway decent, even using a laptop mic. Now I just need to learn how to mix tracks properly.

Anyway the experiment was a failure. It’s still stuck in my head.

DreamWorks Dragons Adventure Takes Flight

HitPoint has just finished up development on a new mobile game based on the DreamWorks movie How to Train Your Dragon 2. The game, Dragons Adventure: World Explorer, is currently available as a free download on select Windows 8 tablets and Windows Phone 8 devices. Dragons Adventure is the result of a collaboration between HitPoint, DreamWorks, and Microsoft, and has been in the works for several months. The game has several pretty unique and innovative features, mostly centering around incorporating location-based, live data into an augmented reality experience.

In Dragons Adventure: World Explorer you play as Hiccup or Astrid and fly dragons from the DreamWorks franchise over a 3D-rendered map of the real world styled to look like Hiccup’s home island of Berk. Local points of interest show up in the world as medieval buildings. It’s not uncommon to fly over a Starbucks transformed into a medieval mead hall, or a Hilton styled to look like an ancient viking inn. You’ll even see different numbers of vikings milling outside those locations based on how popular the venue is. The game also factors in the time of day and weather conditions of the place in the world you’re traveling through, so while it might be a bright sunny afternoon outside your house, you can navigate your dragon though a rainy evening in Paris.

The most interesting part of Dragons Adventure though can only be experienced on a road trip. Hop in the car, select a destination, and the game will use your device’s GPS to plot a route out for you, complete with a viking ox-drawn cart that travels along with your car as you drive (just make sure you’re playing as a passenger!). Along your route you’ll receive quests to rescue stranded vikings, pick up lost sheep and bring them back to your cart, attack evil viking towers to free friendly imprisoned dragons, and explore the area around you to find new local hotspots you may not have known about.  Device sensor controls let you fly your dragon by tilting the device, just as if you were holding Toothless’ reins yourself! Check out some screenshots from the game below, and download it here if you’ve got a Microsoft Surface or Nokia tablet, or here if you’ve got a compatible Windows Phone 8.

Developing Dragons Adventure was both a huge challenge and a great experience for the HitPoint team. It’s the first real-time 3D game of this scope that we’ve developed, and there was no shortage of technical hurdles we had to overcome.  The biggest challenges came from the integration of the various real-world data streams into the game. The world map uses data pulled from HERE Maps to paint Berk-themed textures onto the game world’s terrain. The viking buildings that pop up throughout the game are populated based on local point-of-interest data from Foursquare. The local weather is obtained from a Weather.com live feed. Getting all of these features integrated, playing nicely with each other, and incorporated into the game’s quest design was no small task, but I think HitPoint did an amazing job pulling it off, and so far those that have played the game have had similar opinions.

We’re definitely proud of the product we were able to build with DreamWorks and Microsoft, and hopefully there will be some more interesting updates to come down the road. In the meantime if you’ve got a compatible device go check the game out and let us know what you think!

DAWE Screenshot

 

 

DAWE Screenshot

DAWE Screenshot