{"id":49,"date":"2024-08-02T08:47:04+00:00","date_gmt":"2024-08-02T08:47:04+00:00","guid":{"rendered":"https:\/\/ragecognito.digital\/?p=49"},"modified":"2024-08-02T18:46:14+00:00","modified_gmt":"2024-08-02T18:46:14+00:00","slug":"setting-up-the-environment-the-first-steps","status":"publish","type":"post","link":"https:\/\/ragecognito.digital\/?p=49","title":{"rendered":"Setting Up the Environment: The First Steps"},"content":{"rendered":"\n<p>My Machine Learning Adventure (A Series)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong><em>Welcome back<\/em><\/strong> (<a href=\"https:\/\/ragecognito.digital\/?p=47\" target=\"_blank\" rel=\"noopener\" title=\"\">previous post<\/a>)! My journey to understand machine learning began a year ago, but as someone who has always been curious about the capabilities of AI, I\u2019ve only recently made any real beginner breakthroughs. My new approach is to interactively work with various GPT models and formats to design an interactive machine learning model that learns directly through my inputs. In this blog series, I document my experiences, the challenges I faced, and the solutions I discovered along the way.<\/p>\n\n\n\n<p>In the first post, I introduced the project and shared my excitement about the concept of <a href=\"https:\/\/www.labellerr.com\/blog\/a-deep-dive-into-active-learning-strategies-applications-and-challenges\/\" target=\"_blank\" rel=\"noopener\" title=\"\">active learning<\/a>. Today, we&#8217;ll get our hands dirty by setting up the environment and taking the first steps towards building our interactive machine learning model.  <\/p>\n\n\n\n<p>Remember, I&#8217;m no machine learning expert.  I&#8217;m learning just like you!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Initial Environment + Library Installation<\/h3>\n\n\n\n<p><strong>Quick hits<\/strong>:<em> I used <a href=\"https:\/\/jupyter.org\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Jupyter Notebook and JupyterLab<\/a>, as installed in the <a href=\"https:\/\/www.anaconda.com\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Anaconda Navigator Package<\/a>. The coverage of those particular processes will not be extensive in this series, but maybe I&#8217;ll try to highlight them in the future. <\/em><\/p>\n\n\n\n<p>Before building my model, I needed to set up the necessary tools. With the assistance of <a href=\"https:\/\/chatgpt.com\/g\/g-TfiZbdLM0-machine-learning\" target=\"_blank\" rel=\"noopener\" title=\"\">Machine Learning GPT<\/a>, I chose the following set of libraries:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>modAL<\/strong>: A modular active learning framework for Python.<\/li>\n\n\n\n<li><strong>scikit-learn<\/strong>: A popular machine learning library.<\/li>\n\n\n\n<li><strong>numpy<\/strong>: A library for numerical computations.<\/li>\n\n\n\n<li><strong>pandas<\/strong>: A powerful data manipulation tool.<\/li>\n\n\n\n<li><strong>joblib<\/strong>: Used for saving and loading models.<\/li>\n<\/ul>\n\n\n\n<p> I installed them in JupyterLab by opening a terminal in my project folder environment and entering the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>pip install modAL scikit-learn numpy pandas joblib<\/code><\/code><\/pre>\n\n\n\n<p>This installs the libraries in the environment so they are not rerun in the code.<\/p>\n\n\n\n<p>These libraries provide a solid foundation for the project, allowing efficient data handling and building robust machine learning models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Creating and Saving the Initial Dataset<\/h3>\n\n\n\n<p>With the libraries installed, it was time to create some data. I used <code>make_classification<\/code> from scikit-learn to generate a synthetic dataset that simulates a binary classification problem, such as whether a picture is or is not a dog, or whether sentiment in text is positive or negative.<\/p>\n\n\n\n<p>This dataset served as the foundation for training and testing the model.  Imports required:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import numpy as np\nimport joblib\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\n<\/code><\/code><\/pre>\n\n\n\n<p>The next portion of code generates a dataset that simulates a binary classification problem as mentioned earlier.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code># Generate a synthetic dataset\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)\n<\/code><\/code><\/pre>\n\n\n\n<p>By saving the dataset using <code>joblib<\/code>, I ensured that it could be easily loaded and used later in the project.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code># Save the dataset\njoblib.dump((X, y), 'X_y_data.pkl'<\/code><\/code><\/pre>\n\n\n\n<p>The data was then split into training and pool sets, where the training set would initialize the model, and the pool set would be used for querying.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code># Split the dataset\nX_train, X_pool, y_train, y_pool = train_test_split(X, y, test_size=0.75, random_state=42)<\/code><\/code><\/pre>\n\n\n\n<p>Let&#8217;s break down what the previous block of code does in a way that&#8217;s easier to understand, because I struggled with it.  <\/p>\n\n\n\n<p>Imagine you have a big list of data (like numbers, images, or text), and you want to use this data to teach a computer to do something, like recognizing cats in pictures or predicting how much your favorite sports team will score in a game. This <strong><em>dataset<\/em><\/strong> has x &#8220;input&#8221; data (like pictures of animals) and y labels (such as &#8220;dog&#8221; or &#8220;cat&#8221;). <\/p>\n\n\n\n<p>You can&#8217;t use all your data to teach the computer (train it). You need to keep some data aside to test if the computer has learned correctly.<\/p>\n\n\n\n<p>This code splits your data into two parts: one part for training and one part for testing or other purposes.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>The Code Breakdown<\/strong>\n<ul class=\"wp-block-list\">\n<li><code>train_test_split<\/code> is a function that does the splitting for you.<\/li>\n\n\n\n<li><code>X_train<\/code> and <code>y_train<\/code> are the parts of your data that you&#8217;ll use to train the computer.<\/li>\n\n\n\n<li><code>X_pool<\/code> and <code>y_pool<\/code> are the parts of your data that you&#8217;ll keep aside for later (like testing).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Parameters in the Code<\/strong>\n<ul class=\"wp-block-list\">\n<li><code>X<\/code> and <code>y<\/code>: Your whole dataset.<\/li>\n\n\n\n<li><code>test_size=0.75<\/code>: This means 75% of your data will be kept aside (in <code>X_pool<\/code> and <code>y_pool<\/code>), and 25% will be used for training (in <code>X_train<\/code> and <code>y_train<\/code>).<\/li>\n\n\n\n<li><code>random_state=42<\/code>: This is like setting a &#8220;seed&#8221; so that if you run this code again, you get the same split every time. It&#8217;s useful for consistency.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Initializing the Active Learner Model<\/h3>\n\n\n\n<p>Even though I didn&#8217;t realize what I&#8217;d done at the time, I&#8217;d generated a synthetic dataset and split the data into training and testing data. The next step was to initialize the active learner model. <\/p>\n\n\n\n<p>I chose the <code>RandomForestClassifier<\/code> from scikit-learn as the base estimator for my active learner.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>from modAL.models import ActiveLearner\nfrom sklearn.ensemble import RandomForestClassifier<\/code><\/code><\/pre>\n\n\n\n<p>Keeping with our cats\/dogs categorizing example, a <strong><em>learner<\/em><\/strong> is like a student that learns from these examples. Active learning is a special way of teaching the computer. Instead of showing it all the pictures at once, you start by showing it a few, and then it asks for more examples of things it is unsure about. This way, it learns more efficiently.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code># Initialize the learner\nlearner = ActiveLearner(\n    estimator=RandomForestClassifier(),\n    X_training=X_train, y_training=y_train\n)<\/code><\/code><\/pre>\n\n\n\n<p>I saved the model and data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code><code># Save the model and the pool data\njoblib.dump(learner, 'active_learner_model.pkl')\njoblib.dump((X_pool, y_pool), 'X_y_pool.pkl')<\/code><\/code><\/code><\/pre>\n\n\n\n<p>The model was initialized with the training data and saved for future use. This step marked the completion of our initial setup, providing a solid base for building the interactive elements of our project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Challenges and Reflections<\/h3>\n\n\n\n<p>One of the initial challenges I faced was ensuring that all libraries were correctly installed and compatible with each other. That was mostly due to my misunderstanding about how &#8220;pip&#8221; works.  I initially had everything in my notebook, but once I realized the libraries should be installed through the terminal\/command line, I was able to make progress.<\/p>\n\n\n\n<p>Once the environment was set up, the process of generating and saving data, as well as model initialization, went smoothly. This part of my adventure was crucial for understanding the basic components needed for the project and building a strong foundation. It took me days to figure out&#8230;<\/p>\n\n\n\n<p>, I realized the importance of patience and attention to detail. Patience is definitely not a large part of my personality or temperament, but I realized through this endeavor that it (and attention to detail) are vitally important.  Setting up the environment might seem like a straightforward task, but it&#8217;s the backbone of the entire project. Ensuring that everything is correctly installed and configured saves <strong><em>a lot<\/em><\/strong> of time and frustration down the line.<\/p>\n\n\n\n<p>In the next post, we&#8217;ll dive into building the interactive web application using Flask. This is where the project starts to come to life, allowing us to interact with the model in real-time. Stay tuned!<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<div class=\"syndication-links\"><\/div>","protected":false},"excerpt":{"rendered":"<p>My Machine Learning Adventure (A Series) Welcome back (previous post)! My journey to understand machine learning began a year ago, but as someone who has always been curious about the capabilities of AI, I\u2019ve only recently made any real beginner breakthroughs. My new approach is to interactively work with various GPT models and formats to&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"mf2_syndication":[],"venue_id":0,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-49","post","type-post","status-publish","format-standard","hentry","category-uncategorized","kind-"],"kind":false,"_links":{"self":[{"href":"https:\/\/ragecognito.digital\/index.php?rest_route=\/wp\/v2\/posts\/49","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ragecognito.digital\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ragecognito.digital\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ragecognito.digital\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ragecognito.digital\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=49"}],"version-history":[{"count":5,"href":"https:\/\/ragecognito.digital\/index.php?rest_route=\/wp\/v2\/posts\/49\/revisions"}],"predecessor-version":[{"id":59,"href":"https:\/\/ragecognito.digital\/index.php?rest_route=\/wp\/v2\/posts\/49\/revisions\/59"}],"wp:attachment":[{"href":"https:\/\/ragecognito.digital\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=49"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ragecognito.digital\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=49"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ragecognito.digital\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=49"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}