Reinforcement Learning For Business: Real Life Examples (2021 update)

October 9th, 2020

Evgenia Kuzmenko KITRUM Brand ManagerEvgenia Kuzmenko

Among many other deep learning techniques, Reinforcement Learning (RL) and its popularity have been on the rise. A lot of the buzz pertaining to reinforcement learning was initiated thanks to AlphaGo by Deepmind. AlphaGo was developed to play the game Go, or rather, a very complex version of it. The essence of Reinforcement Learning is based on learning through environmental interaction, as well as through adapting to, learning from, and calibrating future decisions based on mistakes. Reinforcement learning is based on a delayed and cumulative reward system. In this system, an agent reconciles an action that influences a state change of the environment. When similar circumstances occur in the future, the system recognizes the best decision to be made based on the experience of previously recalled actions.

The intended application of Reinforcement Learning is to evolve and improve systems without human or programmatic intervention. This creates an interesting dynamic among real-world applications, such as, for instance, autonomous vehicles. Autonomous driving is a tough puzzle to solve, at least not using solely the conventional AI methods. It uses Convolutional Neural Networks (CNNs), which in turn utilizes computer vision. Due to the strong interaction with the environment that includes pedestrians, other vehicles, road infrastructure, road conditions, and driver behavior, autonomous driving cannot be modeled just as a supervised learning problem. If viewed from an abstract level, autonomous driving agents call for the implementation of sequential steps formed from three tasks: sensing, planning, and control.

FYI: In our previous article we explained the overall principle of Machine Learning and touched on the RL subject. You can learn more here.

Reinforcement Learning

Applying Reinforcement Learning In Your Business

Reinforcement Learning In Healthcare

The nature of many medicinal decision problems is sequential. As a patient sees a doctor, a treatment plan is decided upon. Then, once the points of the plan are administered, The result of the treatment will then dictate what the next logical action for future treatment will be. Modeled as an MDP, this type of decision problem can be addressed by leveraging RL algorithms.

The problem with AI systems is that they exclusively act on the patient’s current state, rather than considering the sequential nature of past decisions. Reinforcement Learning takes into account not only the treatment’s immediate effect but also takes into account the long term benefit to patients.

While the solution of using Reinforcement Learning in medicine is appealing, there are some challenges to overcome before applying RL algorithms to be used at hospitals. Many of the learned decisions of Reinforcement Learning are based on trial-and-error, an exploratory practice that is not a viable option. Therefore, RL would need to instead learn practices based data existent thanks to the collection of fixed treatment strategies. This ‘off-policy’ strategy of learning, therefore. Play an important role in a setting such as one that includes the practice of medicine.

Another important factor in determining the optimal policy is to determine what the reward should be. To make this determination in the medical field involves weighing factors such as the life expectancy of a patient against the cost of a particular treatment. This dilemma, already under heavy discussion in multiple countries.

There is already literature for several examples of Reinforcement Learning applications, counting among them treatments for lung cancer and epilepsy. In the case of sepsis, deep RL treatment strategies have been developed based on medical registry data.

One problem that is uniquely suited as a sequential decision-making one in nature is in nephrology. Specifically, it applies to the use of erythropoiesis-stimulating agents (ESAs) in patients with chronic kidney disease. However, since the effects of ESAs are unpredictable, the patient’s condition should always be closely monitored. Depending on the patient’s current condition, the medical team has to make a decision about which action to take next. This decision will then affect the patient’s future condition. For this reason, multiple authors have pushed for the idea of utilizing RL to control the administration of ESAs.

Reinforcement Learning In Security

By using pragmatic applications, Reinforcement Learning can save and speed up your internet connection. To really understand this, it helps to go through the admin panel of your network called, an IP address specified by router companies. Logging on to this address will permit you access to a dashboard from the router company. From here, you will be able to optimize your network’s integrity and speed.

The availability of such abstract libraries as Keras is democratizing deep learning adoption. The mathematically complex concepts stored in these libraries can permit you to work on developing models for optimal operations, highly customized and parameterized tuning, and model deployment.

Reinforcement Learning Driven Robots In A Factory

Logical automation propelled by reinforcement learning also takes place in production factories. Robots are performing many redundant duties, but some are also using deep reinforcement to learn how to perform their designated tasks with the most efficacy, speed, and precision. Take, for instance, the operational robot at the Japanese run company Fanuc. The industrial robot is clever enough to train itself to perform a particular job, making it the pride of the company’s manufacturing hand.

As the robot performs a particular task with an object, it captures the action on video. This is a type of ‘memory’ the robot will then use to influence future actions with this object. Whether the performance of the task captured in video footage is successful or not, the robot ‘learns’ from it. This is all part of a deep learning model that controls and influences the robot’s future actions.

Source: Robot Reports

Reinforcement Learning For Customer Delivery

The goal of any manufacturer that sells products to customers is to serve their demand, delivering said products to the customers’ possession quickly and safely, while minimizing the costs of doing so. These savings help the manufacturer’s business thrive by increasing profit margins. To engage in the timely product distributions, the manufacturer engages in Split Delivery Vehicle Routing.

Such a manufacturer benefits vastly from an approach rooted in reinforcement learning. Such a manufacturer introduces multi-agent systems. In such systems, agents communicate and cooperate with each other leveraging reinforcement learning techniques. Using Q-learning, a system is developed to serve multiple customers with the use of just one vehicle. By reducing the number of trucks used to deliver products to customers and optimizing execution time, the manufacturer benefits in cutting costs, improving the efficiency of delivery, and increasing profit margins.

Reinforcement Learning For E-Commerce

E-commerce is a business that relies heavily on personalization of product promotion. It is imperative for merchants in e-commerce businesses to communicate with and promote to the correct target audience to make sales. Getting their products in front of the eyes of relevant prospective consumers is based largely on Reinforcement Learning algorithms as they permit e-commerce to study and adapt to customers’ shopping trends and behaviors, as well as helping to tailor their services or products to the customer’s specific interests.

RL For Trading

Reinforcement learning promotes maximizing the business’s benefits, end-to-end optimization, and helping frame the parameters the business operates under in order to achieve the best possible result. When there is a ‘negative reward’ as sales shrink, by 30% for instance, the agent is often forced to reevaluate their business policy, and potentially consider a different one. Using reinforcement learning to deal with such crucial situations by creating simulations. These simulations can manifest scenarios with a negative reward for an agent, which will, in turn, help the agent come up with workarounds and tailored approaches to these types of situations. Repeating the process of similar strategy adjustments based on RL over time will permit the agent the ability to perpetually keep auto-tuning their operation to adjust to any downturn or problem that may arise. 

At IBM, a sophisticated system built on a DSX platform makes decisions on financial trades by harnessing the power of reinforced learning. The model uses the historical context of stock price data by the use of stochastic actions during every step of the trade. These actions are then used as the appropriate reward function based on either a loss or profit gained from each trade.

Further evolution of modeless programming with RL is an important factor to move away from rule-based AI programming. This is a difficult process to adjust to and therefore is certain to encounter problems along the way. The RL neural networks have very high training data requirements that take a significant amount of time and resources to gather enough relevant data to build out and analyze new scenarios and conditions for evaluation. For this reason, the process of collecting the data needs to be autonomous.

The goal is to always improve the accuracy of predictions with the use of modern simulation methods and to create virtual miles. GANs (Generative Adversarial Networks) is one of the key technologies that will allow simulation of synthetic data collection to be used in the mainstream. GANs are essentially competing or dueling networks, set up to oppose each other, one acting as a generator, the other as a discriminator. As parts of the neural net, the generator creates the data, and the discriminator tests it for authenticity. As time goes by, the generator learns to create data so seamlessly that the discriminator can no longer reconcile which data is real and which is fake. As an example, with regards to the realm of autonomous driving, GANs can use an actual driving scenario and supplement it with variables such as lighting, traffic conditions, and weather. These create a wide array of scenarios that are photorealistic and can be utilized for better training.

Being able to verify and explain deep learning algorithms presents another challenge, an area where a lot of research is still ongoing. Ultimately, the entire solution needs to be ASIL (Automotive Safety Integrity Level) compliant, be automotive grade, and each decision made by the AI must be traceable.

Concerningly, the skills that enable feature engineering to reshape data using domain knowledge, are in short supply, an aspect that predictive models hinge on and rely upon entirely to be effective. Certain AutoML platforms are already smart enough to be able to remove the noise and discard weaker features of processes. The use of their ensembles of varying models remains pivotal. After all, to predict real-world problems, a set of predictor models must be able to consider and include a little bit of everything.

RL For Banking

Starting in the front office, the use of virtual agents in customer service has become indispensable. Digital assistants, chatbots, or voice systems slip into the role of customer service or sales representative in order to automate the dialog in customer service. Facebook chatbots allow their customers to transfer money or navigate through product details. They support you in the process of the online loan application or in the search for branches and exchange rates. In addition to chatbots, the technology is also used in Robo-advisory applications, e.g. in portfolio management, securities advice, or stock and bond trading.

In addition, the automation of business processes can reduce costs and support employees in routine tasks, allowing them to focus on strategic and analytical topics. Unsupervised learning in the area of regulatory reporting can help, for example, to identify and minimize data quality problems in data sets by using the algorithm to carry out root-cause analyzes. Data management is optimized through a more holistic picture of customer and transaction activities based on deep learning.
In-depth data analysis enables banks to generate richer insights from their business processes, which in turn improve and accelerate decision-making. Reinforcement learning is used, for example, in risk management for credit checks. Furthermore, machine learning models can identify and reduce risks in the fight against fraud. With the help of supervised and unsupervised learning, alerts in customer behavior are examined and the likelihood of corporate bankruptcies is predicted. And there are no limits to creativity: With social media analytics, social media data can also be used as a source for crisis forecasts.

If you need experienced ML developers – feel free to drop us a line here!