IRURAJACKBLOGS

Search Results

Modelling & Machine Learning (ML) Content

Modeling and Machine Learning


The main difference between modeling and machine learning is that modeling relies on predefined mathematical or statistical formulas to explain relationships in data, while machine learning automatically learns patterns from data without explicit programming. Modeling is often based on assumptions and is more interpretable, whereas machine learning is data-driven, flexible, and excels in handling complex, large-scale, and unstructured data like images and text.

There are two approaches in Machine Learning and Modeling Supervised and Unsupervised


In supervised learning, the data is typically divided into three subsets:
- Training Data: Used to train the model.
- Validation Data: Used during training to tune hyperparameters and evaluate the modelโ€™s performance to prevent overfitting.
- Test Data: Used after training to evaluate the final performance of the model on unseen data.
While in unsupervised learning:
- The entire dataset is often used for training because there are no predefined labels.

Supervised and Unsupervised Machine Learning Algorithms


Learning Curve

Examples of Supervised and Unsupervised Models


Learning Curve

Learning Curve


Learning Curve

Some Examples of Modeling Models that you will come across in my notebooks


โ†’ ๐—ฅ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€

1๏ธโƒฃ ๐—Ÿ๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป
Linear Regression models predict a continuous output variable based on one or more input features. This model assumes there's a linear relationship between the input and output variables.
2๏ธโƒฃ ๐—Ÿ๐—ผ๐—ด๐—ถ๐˜€๐˜๐—ถ๐—ฐ ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป
Logistic regression predicts the probability of an outcome that can only have two values, like yes/no, 1/0. We typically also put a threshold on the predicted probability to determine the predicted classification
โ†’ ๐—ง๐—ฟ๐—ฒ๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€

3๏ธโƒฃ ๐——๐—ฒ๐—ฐ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐˜๐—ฟ๐—ฒ๐—ฒ
Decision tree models look like flow charts; they help us make decisions based on a series of questions or input variables. Decision trees can be used to classify data or predict continuous outcomes.
4๏ธโƒฃ ๐—ฅ๐—ฎ๐—ป๐—ฑ๐—ผ๐—บ ๐—ณ๐—ผ๐—ฟ๐—ฒ๐˜€๐˜
Random Forest combines multiple decision trees to make predictions. It creates many decision trees using random subsets of the data and features. This approach helps reduce overfitting and improves generalization.
โ†’๐—–๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€

5๏ธโƒฃ ๐—›๐—ถ๐—ฒ๐—ฟ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฐ๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด
Hierarchical Clustering is an unsupervised model that builds a tree-like structure of clusters. They can be built from bottom up (each data point starts in its own cluster and clusters are merged as you move up the hierarchy) or top down (where all data starts in one cluster and splits occur as you move down the hierarchy).
โ†’ ๐—œ๐—ป๐˜€๐˜๐—ฎ๐—ป๐—ฐ๐—ฒ-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€

6๏ธโƒฃ K-nearest neighbors KNN predicts outputs by finding the K most similar data points to a new input, and using their outputs to make a prediction. KNN is non-parametric, meaning it doesn't make assumptions about the underlying data distribution.

Some Examples of Machine learning Models that you will come across in my notebooks


1๏ธโƒฃ๐—ž-๐—บ๐—ฒ๐—ฎ๐—ป๐˜€ ๐—ฐ๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐—ถ๐—ปg
K-means Clustering is an unsupervised model that groups similar data points into K clusters based on their features. It aims to minimize the distance between each data point and the cluster center.
2๏ธโƒฃPrincipal Component Analysis (PCA)
It is an unsupervised statistical model primarily used for dimensionality reduction and feature extraction. Rather than making predictions, PCA transforms high-dimensional data into a smaller set of uncorrelated components (principal components) that retain most of the original data's variance.

1. LINEAR REGRESSION MODEL



2. LOGISTIC REGRESSION MODEL



3. DECISION TREES & RANDOM FOREST MODELS



4. GRADIENT BOOSTING MODEL


References
  • Scikit-learn
  • Deep Learning for Computer Vision Content

    What is deep learning?

    Deep learning is a subset of machine learning that uses multilayered neural networks, called deep neural networks, to simulate the complex decision-making power of the human brain. Some form of deep learning powers most of the artificial intelligence (AI) applications in our lives today. The difference between deep learning and machine learning is the structure of the underlying neural network architecture. โ€œNondeep,โ€ traditional machine learning models use simple neural networks with one or two computational layers. Deep learning models use three or more layersโ€”but typically hundreds or thousands of layersโ€”to train the models. Deep learning is an aspect of data science that drives many applications and services that improve automation, performing analytical and physical tasks without human intervention. This enables many everyday products and servicesโ€”such as digital assistants, voice-enabled TV remotes, credit card fraud detection, self-driving cars and generative AI.

    Face Detection with MTCNN (Multi-Task Cascaded Convolutional Networks)



    NoteBook_1


    Machine Learning Projects

    1. Customer Churn Dataset


    Dataset Description


    Customer churn refers to the phenomenon where customers discontinue their relationship or subscription with a company or service provider. It represents the rate at which customers stop using a company's products or services within a specific period. Churn is an important metric for businesses as it directly impacts revenue, growth, and customer retention. In the context of the Churn dataset, the churn label indicates whether a customer has churned or not. A churned customer is one who has decided to discontinue their subscription or usage of the company's services. On the other hand, a non-churned customer is one who continues to remain engaged and retains their relationship with the company. Understanding customer churn is crucial for businesses to identify patterns, factors, and indicators that contribute to customer attrition. By analyzing churn behavior and its associated features, companies can develop strategies to retain existing customers, improve customer satisfaction, and reduce customer turnover. Predictive modeling techniques can also be applied to forecast and proactively address potential churn, enabling companies to take proactive measures to retain at-risk customers.



    My Project in Logistic Regression, Decision Trees & Random Forest Models


    Kaggle Competation

    2. Predict the Introverts from the Extroverts


    Dataset Description


    The dataset for this competition (both train and test) was generated from a deep learning model trained on the Extrovert vs. Introvert Behavior dataset. Feature distributions are close to, but not the same as, the original. Feel free to use the original dataset as part of this competition, both to explore differences and to see whether incorporating the original in training improves model performance.


    Note โ€“ This is a relatively small dataset, so one to use for comparing different modeling approaches, making visualization, etc.


    Files



    Predict the Introverts from the Extroverts


    Kaggle Competation

    3. Regression of Used Car Prices


    Dataset Description


    About the Tabular Playground Series


    The goal of the Tabular Playground Series is to provide the Kaggle community with a variety of fairly light-weight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The duration of each competition will generally only last a few weeks, and may have longer or shorter durations depending on the challenge. The challenges will generally use fairly light-weight datasets that are synthetically generated from real-world data, and will provide an opportunity to quickly iterate through various model and feature engineering ideas, create visualizations, etc.


    Synthetically-Generated Datasets


    Using synthetic data for Playground competitions allows us to strike a balance between having real-world data (with named features) and ensuring test labels are not publicly available. This allows us to host competitions with more interesting datasets than in the past. While there are still challenges with synthetic data generation, the state-of-the-art is much better now than when we started the Tabular Playground Series two years ago, and that goal is to produce datasets that have far fewer artifacts. Please feel free to give us feedback on the datasets for the different competitions so that we can continue to improve!


    Files



    Regression of Used Car Prices

    Welcome,

  • LIKE & FOLLOW MY FACEBOOK PAGE
  • I am a freelancer in data science, and I offer free data analysis materials in Python. First, I will introduce you to coding in Python. After you gain some skills, I will guide you through data analysis and visualization in Python, step by step.

    We start here, introduction to coding in Python, this will help you have sufficient skills to venture into data analysis with Python


    1. INTRODUCTION TO PROGRAMMING IN PYTHON

    2. VARIABLES AND DATA TYPES IN PYTHON

    3. BRANCHING AND LOOPING IN PYTHON

    4. REUSABLE CODE USING FUNCTIONS IN PYTHON

    5. COMPUTING WITH NUMPY IN PYTHON

    6. ANALYZING WITH PANDAS AND VISUALIZATION IN PYTHON

    7. USING MATPLOTLIB AND SEABORN

    What is a Database?


    A database is an organized collection of data that enables efficient storage, retrieval, and management of information. Instead of storing data in scattered files, databases allow for structured storage, making it easy to access and manipulate data efficiently.

    Types of Databases


    Databases come in various forms, including:
    1. Relational Databases (RDBMS) โ€“ Stores data in tables with rows and columns. Example: MySQL, PostgreSQL.
    2. NoSQL Databases โ€“ Designed for unstructured or semi-structured data. Example: MongoDB, Firebase.
    3. Graph Databases โ€“ Used for representing relationships between entities. Example: Neo4j.
    4. Key-Value Stores โ€“ Simple storage format for fast lookups. Example: Redis.

    1. Introduction to Databases


    Among these, Relational Databases are the most commonly used for structured data storage, and MySQL is one of the most popular RDBMS options.

    Introduction to MySQL


    MySQL is an open-source Relational Database Management System (RDBMS) used for managing structured data. It is widely used in web applications, businesses, and data-driven projects due to its reliability, speed, and ease of use.

    SQL (Structured Query Language)


    SQL is the language used to interact with MySQL databases.
    In these notebooks, I will be working with MySQL databases and SQL.

    2. Introduction to Relational Databases with SQL



    3. CRUD Operations with SQL



    4. Advanced SQL Techniques



    5. Adnamced Sql and Joins




    6. Integrating Python with MySQL Databases

    Fixing Power BI MySQL Connection Issues

    Data Analysis with Python


    Afcon Analysis in Python
    Come Try this Project with Me

    Comming very soon...!

    Building Web Apps Using Django and Python



    1. Introduction to Django



    2. Setting Up Django


    3. Models and Django Object-Relational Mapping (ORM) system


    4. Advanced Model Relationships


    5. Django Views and URL Configuration


    6. Templates and Static Content Management


    7. User Authentication Basics


    8. Django Admin Interface


    9. Examples in (Models, views, and URLs)


    10. Custom User Models and Authentication


    11. Permissions and Authorization


    12. Security Practices in Django

    A project On Kenya Online Market

  • Kenya Online Markets

    This project showcases a responsive e-commerce platform built using modern web technologies, focusing on user experience and efficient product display.

  • Irura Jackson Mwongera

    is a Mathematician specialized in Data Science/Data Analytics with knowledge in website/system development.

    Note:
    Irura's Bio

    Irura Jackson Mwongera is a diligent and versatile professional with extensive experience in data science, data analysis, web development, and data annotation.Irura's academic background includes a Bachelor of Science in Mathematics with Information Technology, specializing in statistics, from Masinde Muliro University of Science and Technology. He possesses a diverse skill set, including website front-end development (HTML, CSS, JavaScript), database management (MySQL, SQL), data analysis and visualization (Python, SQL, Excel, and Power BI), technical writing, data annotation, Machine Learning and deep learning for computer vision. His practical knowledge extends to basic computer maintenance, IT equipment installation, and data backup, making him a well-rounded and resourceful professional.In his personal life, Irura enjoys coding in Python, web development, watching football, socializing, researching, traveling, and seeking adventure. His hobbies reflect his curiosity and enthusiasm for continuous learning and exploration.

    MY DATA SCIENCE & DATA ANALYTICS
    Badges and Certificates
    ALX AFRICA
    Certificate For Professional Foundations
    ALX AFRICA
    Data Analytics
    WorldQuant University
    Applied Data Science Lab
    WorldQuant University
    Applied AI Lab: Deep Learning for Computer Vision
    JOVIAN
    Data Analysis in Python
    JOVIAN
    Machine Learning with Python

    Services Offered:

    Contact Us

    Goooooooogle Workspace Skills are essential to every professional in todays life

    Browse the link below and see if you have all these skills

    Open Google Sheets

    Now,if you feel you lack the above Essential google Skills

    ,

    You will become pro to all these in just one week

    Open Google Sheets
    SUPPORT VIA CREDIT CARD
    Donate with PayPal button
    SUPPORT VIA MPESA

    BUY GOODS TILL NUMBER

    8586184

    Support my data science and deep Learning research

    1. Introduction to IoT

    1. How Blockchain Works

    2. Types of Blockchain

    3. Introduction to Bitcoin

    1. Cloud Computing Introduction

    1. Introduction to Android