link youtube: https://www.youtube.com/watch?v=JgvyzIkgxF0&t=440s
Install Sublime Text
- Quick launch 1 code folder
Because I use Mac and Linux, I often type commands on Terminal, and Win friends, people google add:
subl [folder_path]: Open folder with sublime text
subl [file_path]: Open file with sublime text
- Must Have Plugins
2.1 installed so that you can search and install packages for Sublime Text directly. To open the command line screen, use the key combination Ctrl + Shirt + P.
2.2Emmet: supports super-fast HTML editing.
2.4Git Gutter: This package helps to notify your version changes to the Git server
2.5DocBlockr: Automatically create standard comments
CodeIntel: Easily find out where the functions, classes, .. in use are written from
2.6Bracket HighLighter: This package makes it easy to see where the opening/closing part of the card is located.
2.7AutoFileName: This package will display all the files in the folder so you can embed files more simply.
2.8ColorHighlighter: Display color in css code
3.1 Frequently used keyboard shortcuts
Shift + Alt + (1/2/3/4/5/8/9): Split into multiple screens
Shift + F11: Full screen
Ctrl + P: Quickly open a file
Ctrl + Shirt + T: Open the closed file.
Ctrl + Tab: Go to the most recently opened tab.
Alt + number: Go to tab by numbered order
Ctrl + PgUp/PgDown: Switch tabs in a circle
Ctrl + W: Close current tab / Exit Sublime Text
3.2 Shortcuts in 1 tab
Ctrl + F: Search
Ctrl + H: Search and Replace
Ctrl + Shift + K: Delete current line
Ctrl + Shift + D: duplicate current line
Ctrl + Shift + ↑ (↓): Move lines/clusters, automatically put in brackets (jaw opening and closing marks)
Ctrl + /: comment
Ctrl + Shift + /: block comment
Ctrl + R: List of functions.
Ctrl + KU: Convert to uppercase
Ctrl + KL: Convert to lowercase
Ctrl + X: Delete the current line but also cache it.
3.3 Navigation shortcuts
Ctrl + G <line number> : Move to line
Ctrl + P :<line number> : Move to line
Ctrl + D: Highlight current word
Ctrl + M: Move to the nearest closing bracket
Ctrl + Shirt + M: Highlight all content in brackets.
Ctrl + Shirt + Left Arrow: Highlight the top of the word to the left.
Ctrl + Shirt + Right Arrow: Highlight to the beginning of the word towards the right.
Ctrl + L: Highlight the current line and move the cursor to the next line.
To configure options such as font size, length blah blah… you go to Preferences -> Setting and edit the file Preferences.sublime-settings – User
Reinforcement Learning w/ Python Tutorial
Welcome to a reinforcement learning tutorial. In this part, we’re going to focus on Q-Learning.
Q-Learning is a model-free form of machine learning, in the sense that the AI “agent” does not need to know or have a model of the environment that it will be in. The same algorithm can be used across a variety of environments.
For a given environment, everything is broken down into “states” and “actions.” The states are observations and samplings that we pull from the environment, and the actions are the choices the agent has made based on the observation. For the purposes of the rest of this tutorial, we’ll use the context of our environment to exemplify how this works.
While our agent doesn’t actually need to know anything about our environment, it would be somewhat useful for you to understand how it works in the context of learning how Q-learning works!
We’re going to be working with OpenAI’s gym, specifically with the “MountainCar-v0” environment. To get to the gym, just do a pip install gym.
Okay, now let’s check out this environment. Most of these basic gym environments are very much the same in the way they work. To initialize the environment, you do a gym.make(NAME), then you env.reset the environment, then you enter into a loop where you do an env.step(ACTION) every iteration. Let’s poke around this environment:
For the various environments, we can query them for how many actions/moves are possible. In this case, there are “3” actions we can pass. This means, when we step the environment, we can pass a 0, 1, or 2 as our “action” for each step. Each time we do this, the environment will return to us the new state, a reward, whether or not the environment is done/complete, and then any extra info that some envs might have.
It doesn’t matter to our model, but, for your understanding, a 0 means push left, 1 is stay still, and 2 means push right. We won’t tell our model any of this, and that’s the power of Q learning. This information is basically irrelevant to it. All the model needs to know is what the options for actions are, and what the reward of performing a chain of those actions would be given a state. Continuing along:
How will Q-learning do that? So we know we can take 3 actions at any given time. That’s our “action space.” Now, we need our “observation space.” In the case of this gym environment, the observations are returned from resets and steps. For example:
Will give you something like [-0.4826636 0. ], which is the starting observation state. While the environment runs, we can also get this information:
At each step, we get the new state, the reward, whether or not the environment is done (either we beat it or exhausted our limit of 200 steps), and then a final “extra info” is returned, but, in this environment, this final return item is not used. Gym throws it in there so we can use the same reinforcement learning programs across a variety of environments without the need to actually change any of the code.
Output from the above:
In our case, we can query the environment to find out the possible ranges for each of these state values:
We’ll use 20 groups/buckets for each range. This is a variable you might decide to tweak later.
So this tells us how large each bucket is, basically how much to increment the range by for each bucket. We can build our q_table now with:
Which is what we’ll be talking about in the next tutorial!
Step 1: Enter the required libraries
Step 2: Identify and visualize the chart
Note: The above chart may not be the same as when regenerating the code because the NetworkX library in Python creates a random chart from the given edges.
Step 3: Determine the reward that the system is for BOT
Step 4: Identify some utility functions that will be used in the training course
Step 5: Training and evaluate Bot by Q-Matrix
Most efficient path: [0,1,3,9,10]
Now, take this bot to a more realistic environment. Let’s imagine that Bot is a detective and is trying to find the position of a large drug racket. He of course concludes that drug sellers will not sell their products in a place where the police know are frequent and medication sites located near the drug seller’s position. In addition, the seller leaves a trace of their products where they sell and this can help detectives find the necessary position. We want to train their bot to find the location using these environment clues.
Step 6: Identify and visualize the new chart with environmental clues
Note: The above chart may look a bit different from the previous chart but in fact, they are the same charts. This is due to the random position of the nodes by the NetworkX library.
Step 7: Identify some utility functions for the training process
Step 8: Visualize the environmental matrix
Step 9: Training and evaluation of the model