The below citation is a glimpse of our experiment using Thompson Sampling and Apriori Algorithm
In AI, the “exploration vs. exploitation tradeoff” applies to learning calculations that need to procure new information and expand their reward in the meantime — what are alluded to as Reinforcement Learning issues. In this setting, lament is characterized as you may expect: a reduction in remuneration because of executing the learning calculation as opposed to carrying on ideally from the earliest starting point. Calculations that streamline for investigation will in general acquire more lament.
We use the multi arm bandit concept in Machine learning and applied to our data set to find out a set of people in mall their, purchasing behaviour.With combination of Apriori algorithm and Thomson sampling we determined which way the the consumers will go in a mall after purchasing a product. With data analytics and physical experimentation we found that the arrangement of certain products resulted in increased sales.
The base code that we used for making changes in our logic is provided below.
import numpy as np import matplotlib.pyplot as plt import pandas as pd
dataset = pd.read_csv('Malldataset.csv')
# Implementing Thompson Sampling import random N = 250000 d = 700 ads_selected =  numbers_of_rewards_1 =  * d numbers_of_rewards_0 =  * d total_reward = 0 for n in range(0, N): ad = 0 max_random = 0 for i in range(0, d): random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1) if random_beta > max_random: max_random = random_beta ad = i ads_selected.append(ad) reward = dataset.values[n, ad] if reward == 1: numbers_of_rewards_1[ad] = numbers_of_rewards_1[ad] + 1 else: numbers_of_rewards_0[ad] = numbers_of_rewards_0[ad] + 1 total_reward = total_reward + reward
plt.hist(ads_selected) plt.title('Histogram of Dataset') plt.xlabel('Ads') plt.ylabel('Number of users chooses a lane') plt.show()