Search Results
Modelling & Machine Learning (ML)
🧠 Dive into Machine Learning!
Explore the fascinating world of modeling and machine learning, from fundamental concepts to advanced algorithms.
Modeling and Machine Learning
The main difference between modeling and machine learning is that modeling relies on predefined mathematical or statistical formulas to explain relationships in data, while machine learning automatically learns patterns from data without explicit programming. Modeling is often based on assumptions and is more interpretable, whereas machine learning is data-driven, flexible, and excels in handling complex, large-scale, and unstructured data like images and text.
There are two approaches in Machine Learning and Modeling: Supervised and Unsupervised
In supervised learning, the data is typically divided into three subsets:
- Training Data: Used to train the model.
- Validation Data: Used during training to tune hyperparameters and evaluate the model’s performance to prevent overfitting.
- Test Data: Used after training to evaluate the final performance of the model on unseen data.
While in unsupervised learning:
- The entire dataset is often used for training because there are no predefined labels.
Supervised and Unsupervised Machine Learning Algorithms
Examples of Supervised and Unsupervised Models
Learning Curve
Some Examples of Modeling Models that you will come across in my notebooks
Regression models
1. Linear regression: Linear Regression models predict a continuous output variable based on one or more input features. This model assumes there's a linear relationship between the input and output variables.
2. Logistic regression: Logistic regression predicts the probability of an outcome that can only have two values, like yes/no, 1/0. We typically also put a threshold on the predicted probability to determine the predicted classification.
Tree models
3. Decision tree: Decision tree models look like flow charts; they help us make decisions based on a series of questions or input variables. Decision trees can be used to classify data or predict continuous outcomes.
4. Random forest: Random Forest combines multiple decision trees to make predictions. It creates many decision trees using random subsets of the data and features. This approach helps reduce overfitting and improves generalization.
Clustering models
5. Hierarchical clustering: Hierarchical Clustering is an unsupervised model that builds a tree-like structure of clusters. They can be built from bottom up (each data point starts in its own cluster and clusters are merged as you move up the hierarchy) or top down (where all data starts in one cluster and splits occur as you move down the hierarchy).
Instance-based models
6. K-nearest neighbors: KNN predicts outputs by finding the K most similar data points to a new input, and using their outputs to make a prediction. KNN is non-parametric, meaning it doesn't make assumptions about the underlying data distribution.
Some Examples of Machine learning Models that you will come across in my notebooks
1. K-means clustering: K-means Clustering is an unsupervised model that groups similar data points into K clusters based on their features. It aims to minimize the distance between each data point and the cluster center.
2. Principal Component Analysis (PCA): It is an unsupervised statistical model primarily used for dimensionality reduction and feature extraction. Rather than making predictions, PCA transforms high-dimensional data into a smaller set of uncorrelated components (principal components) that retain most of the original data's variance.
1. LINEAR REGRESSION MODEL
- Linear regression with one variable using Scikit-learn
- Linear regression with multiple variables
- Using categorical features for machine learning
- Regression coefficients and feature importance
- Other models and techniques for regression using Scikit-learn
- Applying linear regression to other datasets
2. LOGISTIC REGRESSION MODEL
- Exploratory data analysis and visualization
- Splitting a dataset into training, validation & test sets
- Filling/imputing missing values in numeric columns
- Scaling numeric features to a range
- Encoding categorical columns as one-hot vectors
- Training a logistic regression model using Scikit-learn
- Evaluating a model using a validation set and test set
- Saving a model to disk and loading it back
3. DECISION TREES & RANDOM FOREST MODELS
- Preparing a dataset for training
- Training and interpreting decision trees
- Training and interpreting random forests
- Overfitting & hyperparameter tuning
- Making predictions on single inputs
4. GRADIENT BOOSTING MODEL
- Downloading a real-world dataset from a Kaggle competition
- Performing feature engineering and prepare the dataset for training
- Training and interpreting a gradient boosting model using XGBoost
- Training with KFold cross validation and ensembling results
- Configuring the gradient boosting model and tuning hyperparameters
References
Deep Learning for Computer Vision
👁️ See the World with AI!
Explore the cutting-edge of deep learning applied to computer vision tasks, from image recognition to object detection.
What is deep learning?
Deep learning is a subset of machine learning that uses multilayered neural networks, called deep neural networks, to simulate the complex decision-making power of the human brain. Some form of deep learning powers most of the artificial intelligence (AI) applications in our lives today.
The difference between deep learning and machine learning is the structure of the underlying neural network architecture. “Nondeep,” traditional machine learning models use simple neural networks with one or two computational layers. Deep learning models use three or more layers—but typically hundreds or thousands of layers—to train the models.
Deep learning is an aspect of data science that drives many applications and services that improve automation, performing analytical and physical tasks without human intervention. This enables many everyday products and services—such as digital assistants, voice-enabled TV remotes, credit card fraud detection, self-driving cars and generative AI.
Face Detection with MTCNN (Multi-Task Cascaded Convolutional Networks)
NoteBook_1
NoteBook_2
NoteBook_3
NoteBook_4
Medical Data
Generative and Discriminative Models
Medigan, Generative and Discriminative Models
Traffic Data
Traffic Detection
YOLO Model
References
Machine Learning Projects
🛠️ Hands-on Data Science Projects!
Explore real-world applications of machine learning and data analysis through these comprehensive projects.
1. Customer Churn Dataset
Dataset Description
Customer churn refers to the phenomenon where customers discontinue their relationship or subscription with a company or service provider. It represents the rate at which customers stop using a company's products or services within a specific period. Churn is an important metric for businesses as it directly impacts revenue, growth, and customer retention. In the context of the Churn dataset, the churn label indicates whether a customer has churned or not. A churned customer is one who has decided to discontinue their subscription or usage of the company's services. On the other hand, a non-churned customer is one who continues to remain engaged and retains their relationship with the company. Understanding customer churn is crucial for businesses to identify patterns, factors, and indicators that contribute to customer attrition. By analyzing churn behavior and its associated features, companies can develop strategies to retain existing customers, improve customer satisfaction, and reduce customer turnover. Predictive modeling techniques can also be applied to forecast and proactively address potential churn, enabling companies to take proactive measures to retain at-risk customers.
My Project in Logistic Regression, Decision Trees & Random Forest Models
Kaggle Competition
2. Predict the Introverts from the Extroverts
Dataset Description
The dataset for this competition (both train and test) was generated from a deep learning model trained on the Extrovert vs. Introvert Behavior dataset. Feature distributions are close to, but not the same as, the original. Feel free to use the original dataset as part of this competition, both to explore differences and to see whether incorporating the original in training improves model performance.
Note – This is a relatively small dataset, so one to use for comparing different modeling approaches, making visualization, etc.
Files
train.csv
– the training dataset; Personality
is the categorical target
test.csv
– the test dataset; your objective is to predict the Personality
for each row
sample_submission.csv
– a sample submission file in the correct format
Predict the Introverts from the Extroverts
Kaggle Competition
3. Regression of Used Car Prices
Dataset Description
About the Tabular Playground Series
The goal of the Tabular Playground Series is to provide the Kaggle community with a variety of fairly light-weight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The duration of each competition will generally only last a few weeks, and may have longer or shorter durations depending on the challenge. The challenges will generally use fairly light-weight datasets that are synthetically generated from real-world data, and will provide an opportunity to quickly iterate through various model and feature engineering ideas, create visualizations, etc.
Synthetically-Generated Datasets
Using synthetic data for Playground competitions allows us to strike a balance between having real-world data (with named features) and ensuring test labels are not publicly available. This allows us to host competitions with more interesting datasets than in the past. While there are still challenges with synthetic data generation, the state-of-the-art is much better now than when we started the Tabular Playground Series two years ago, and that goal is to produce datasets that have far fewer artifacts. Please feel free to give us feedback on the datasets for the different competitions so that we can continue to improve!
Files
train.csv
– the training dataset
test.csv
– the test dataset
sample_submission.csv
– a sample submission file in the correct format
Regression of Used Car Prices
Python Data Analysis
🚀 Welcome to Python Data Science!
Master data analysis and visualization with Python. From basics to advanced machine learning techniques.
Start Your Journey: Python Programming Fundamentals
Welcome! I am a freelancer in data science, and I offer free data analysis materials in Python. First, I will introduce you to coding in Python. After you gain some skills, I will guide you through data analysis and visualization in Python, step by step.
We start here, introduction to coding in Python, this will help you have sufficient skills to venture into data analysis with Python
1. INTRODUCTION TO PROGRAMMING IN PYTHON
- First steps with Python & Jupyter notebooks
- Arithmetic, conditional & logical operators in Python
- Quick tour with Variables and common data types
2. VARIABLES AND DATA TYPES IN PYTHON
- Storing information using variables
- Primitive data types in Python: Integer, Float, Boolean, None, and String
- Built-in data structures in Python: List, Tuple, and Dictionary
- Methods and operators supported by built-in data types
3. BRANCHING AND LOOPING IN PYTHON
- Branching with if, else, and elif
- Nested conditions and if expressions
- Iteration with while loops
- Iterating over containers with for loops
- Nested loops, break and continue statements
4. REUSABLE CODE USING FUNCTIONS IN PYTHON
- Creating and using functions in Python
- Local variables, return values, and optional arguments
- Reusing functions and using Python library functions
- Exception handling using try-except blocks
- Documenting functions using docstrings
5. COMPUTING WITH NUMPY IN PYTHON
- Working with numerical data in Python
- Going from Python lists to Numpy arrays
- Multi-dimensional Numpy arrays and their benefits
- Array operations, broadcasting, indexing, and slicing
- Working with CSV data files using Numpy
6. ANALYZING WITH PANDAS AND VISUALIZATION IN PYTHON
- Reading a CSV file into a Pandas data frame
- Retrieving data from Pandas data frames
- Querying, sorting, and analyzing data
- Merging, grouping, and aggregation of data
- Extracting useful information from dates
- Basic plotting using line and bar charts
- Writing data frames to CSV files
7. USING MATPLOTLIB AND SEABORN
- Creating and customizing line charts using Matplotlib
- Visualizing relationships between two or more variables using scatter plots
- Studying distributions of variables using histograms & bar charts
- Visualizing two-dimensional data using heatmaps
- Displaying images using Matplotlib's plt.imshow
- Plotting multiple Matplotlib and Seaborn charts in a grid
References
Data Analysis in SQL
🗄️ Query Your Data with SQL!
Learn to retrieve, manipulate, and analyze data efficiently using SQL, the language of databases.
What is a Database?
A database is an organized collection of data that enables efficient storage, retrieval, and management of information. Instead of storing data in scattered files, databases allow for structured storage, making it easy to access and manipulate data efficiently.
Types of Databases
Databases come in various forms, including:
1. Relational Databases (RDBMS) – Stores data in tables with rows and columns. Example: MySQL, PostgreSQL.
2. NoSQL Databases – Designed for unstructured or semi-structured data. Example: MongoDB, Firebase.
3. Graph Databases – Used for representing relationships between entities. Example: Neo4j.
4. Key-Value Stores – Simple storage format for fast lookups. Example: Redis.
1. Introduction to Databases
- Databases definations
- Database Management System (DBMS)
- Types of Databases
- Relationships
- SQL Syntax and Structure
- SQL Data Types
- Installing MySQL
- Running MySQL
- Installing MySQL on Linux
Among these, Relational Databases are the most commonly used for structured data storage, and MySQL is one of the most popular RDBMS options.
Introduction to MySQL
MySQL is an open-source Relational Database Management System (RDBMS) used for managing structured data. It is widely used in web applications, businesses, and data-driven projects due to its reliability, speed, and ease of use.
SQL (Structured Query Language)
SQL is the language used to interact with MySQL databases.
In these notebooks, I will be working with MySQL databases and SQL.
2. Introduction to Relational Databases with SQL
- Use cases and design of relational databases and SQL
- Setting up a database locally using MySQL server
- Creating, modifying, and deleting databases and database tables
- SQL Data types and constraints (primary key, foreign key)
- CRUD (Create, Read, Update, and Delete) operations on tables
- Exporting and importing data from relational databases
3. CRUD Operations with SQL
- DDL (Data Definition Language)
- DML (Data Manipulation Language)
- CRUD Operations in SQL
4. Advanced SQL Techniques
- DQL (Data Query Language)
- Sorting and Filtering Data
- Complex Queries and Subqueries
- Data Control Language (DCL)
5. Adnamced Sql and Joins
- Aggregation, grouping, and pagination in SQL queries
- Mapping functions, arithmetic, and working with dates
- Combining data from different tables using SQL joins
- Improving query performance with indexes
- Executing SQL queries using Python and SQLAlchemy
6. Integrating Python with MySQL Databases
- Introduction to mysql-connector-python
- Working with Cursors
- Executing SQL Queries (SELECT, INSERT, UPDATE, DELETE)
Data Analysis in Excel
📊 Unlock Insights with Excel!
Master data organization, calculations, and powerful visualization techniques using Microsoft Excel.
Check back soon for comprehensive guides!
Power BI
📈 Visualize Your Data with Power BI!
Transform raw data into interactive dashboards and reports to drive business decisions using Power BI.
New materials are coming soon!
Projects Under Data Analysis
🚀 Practical Data Analysis Projects!
Apply your data analysis skills to real-world scenarios with these hands-on projects designed to build your portfolio.
Data Analysis with Python
Afcon Analysis in Python
- This project involves analysis of the Africa Cup of Nations games that have been played from 1957 to 2022
Bike Store Sales Analysis
- This project involves analysis of Bike Store Sales in Europe
Come Try this Project with Me
- The data in this notebook contains the exchange rate of the Kenyan shilling against the US dollar from 1991 to 2024
- Try to come with some insights like factors that cause the Kenyan shilling to depreciate when subjected to the US dollar
- Presidency regime that affected the Kenyan shilling so much
- Remember you don't need to install Python and Anaconda to do this project since I have integrated an IDE from Google called COLAB
- Click the colab and it will allow you to run your code on this notebook
- I am also doing the same project, let's learn together
Front-End Engineering
🎨 Build Beautiful User Interfaces!
Learn the art of creating responsive and engaging web experiences using HTML, CSS, and JavaScript.
Exciting new content is on the way!
Back-End Engineering
⚙️ Power Your Applications from the Server-Side!
Discover how to build robust and scalable server-side logic, databases, and APIs for dynamic web applications.
Building Web Apps Using Django and Python
1. Introduction to Django
- Introduction to Django
- Core Components of Django
- Comparison with Other Web Frameworks
2. Setting Up Django
- Installing Django
- Creating a New Project
- Project Structure
- Django Apps
- Running a Django App
3. Models and Django Object-Relational Mapping (ORM) system
- Models and Their Structure
- Django ORM: Object-Relational Mapping
- Database interaction with the Django ORM
- Configuring the Database
4. Advanced Model Relationships
- ForeignKey Relationships
- OneToOneField Relationships
- ManyToManyField Relationships
- Handling Related Object Deletion
- Performance Considerations
5. Django Views and URL Configuration
- Function-based Views
- Class-based Views
- URL Configuration
6. Templates and Static Content Management
- Django Templates
- Template Language
- Template Inheritance
- Static Files Management
7. User Authentication Basics
- Django’s Built-in Authentication System
- User Registration
- User Login and Logout
- Password Management
- Authentication Views and URLs
8. Django Admin Interface
- Introduction to the Django Admin Interface
- Configuring the Admin Interface
- Customizing the Admin Interface
9. Examples in (Models, views, and URLs)
- Creation of Models.py
- Creation of Views.py
- Creation of App URLs.py & Account/project URLs.py
- Creation of Django templates such as HTML
10. Custom User Models and Authentication
- Enhancing the Default User Model
- Crafting Custom Authentication Backends
11. Permissions and Authorization
- Understanding Permissions and Groups
- Assigning Permissions
- Permission Checks in Views and Templates
- Custom Permissions
12. Security Practices in Django
- Common Web Vulnerabilities and their Impact
- Leveraging Django’s Built-in Security Features
- Implementing Secure Development Practices
Projects Under Web Dev
💻 Build Your Web Development Portfolio!
Showcase your skills with these practical web development projects covering both front-end and back-end technologies.
A project On Kenya Online Market
Click to open - Kenya Online Markets
This project showcases a responsive e-commerce platform built using modern web technologies, focusing on user experience and efficient product display.
oogle Work Space Skills
☁️ Boost Your Productivity with Google Workspace!
Master essential tools like Google Sheets, Docs, and Slides to enhance your collaboration and productivity.
Content for Google Workspace Skills will go here. This section will provide tips and tutorials on utilizing Google's productivity suite.
Learn how to maximize your efficiency!
Browse the link below and see if you have all these skills
- Gmail
- G - Calender
- G - Drive
- G - Docs
- G - Sheets
- G - Slides
- G - Forms
Open Google Sheets
Now,if you feel you lack the above Essential google Skills
,
- Send me you email through irura.mwongera11@gmail.com,,,I will allow you to open the link below with answers
- @ 200 Ksh
You will become pro to all these in just one week
- Gmail
- G - Calender
- G - Drive
- G - Docs
- G - Sheets
- G - Slides
- G - Forms
Open Google Sheets
IoT (Internet of Things)
🔌 Connect the Physical and Digital Worlds!
Explore the exciting field of IoT, where everyday objects are connected to the internet, enabling smart solutions.
1. Introduction to IoT
Cloud Computing
☁️ Scale Your Ambitions with Cloud Power!
Understand the fundamentals of cloud computing, its services, and how it's transforming IT infrastructure.
1. Cloud Computing Introduction
Blockchain
🔗 Understand Decentralized Technologies!
Demystify blockchain technology, its applications, and its impact on various industries beyond cryptocurrencies.
1. How Blockchain Works
2. Types of Blockchain
3. Introduction to Bitcoin
Android Development
📱 Build Your Mobile Apps!
Learn how the Android platform works.
1. Introduction to Android
Irura Bio
👋 Get to Know Me!
Learn about my journey, expertise, and passion for data science and technology.
Irura Jackson Mwongera is a Mathematician specialized in Data Science/Data Analytics with knowledge in website/system development.
Note:
- THIS WEBSITE WAS DESIGNED AND CREATED BY
[Irura Jackson Mwongera System/Website Design/Data Science Ltd]
Irura's Bio
Irura Jackson Mwongera is a diligent and versatile professional with extensive experience in data science, data analysis, web development, and data annotation.Irura's academic background includes a Bachelor of Science in Mathematics with Information Technology, specializing in statistics, from Masinde Muliro University of Science and Technology. He possesses a diverse skill set, including website front-end development (HTML, CSS, JavaScript), database management (MySQL, SQL), data analysis and visualization (Python, SQL, Excel, and Power BI), technical writing, data annotation, Machine Learning and deep learning for computer vision. His practical knowledge extends to basic computer maintenance, IT equipment installation, and data backup, making him a well-rounded and resourceful professional.In his personal life, Irura enjoys coding in Python, web development, watching football, socializing, researching, traveling, and seeking adventure. His hobbies reflect his curiosity and enthusiasm for continuous learning and exploration.
MY DATA SCIENCE & DATA ANALYTICS
Badges and Certificates
ALX AFRICA
Certificate For Professional Foundations
ALX AFRICA
Data Analytics
WorldQuant University
Applied Data Science Lab
WorldQuant University
Applied AI Lab: Deep Learning for Computer Vision
JOVIAN
Data Analysis in Python
JOVIAN
Machine Learning with Python
Services
🤝 How I Can Help You!
Discover the range of services I offer, from data analysis consultations to custom machine learning solutions.
This section outlines the various services offered, such as:
- Data Analysis & Visualization
- Machine Learning Model Development
- Deep Learning Solutions
- Web Development (Front-end & Back-end)
- Technical Consulting & Training
Feel free to reach out for a personalized consultation!
Contacts
📧 Get in Touch!
Connect with me through various channels for inquiries, collaborations, or support.
You can reach me via:
I look forward to hearing from you!
Support Me
💖 Support My Work!
Your support helps me create more free content and resources for the data science community.
If you find the content helpful, consider supporting my work!
Send Money
0768430987
Your support is greatly appreciated!