Tutorials

  • What is a Digital Twin?
    Introduction Internet of Things (IoT) development has led to technologies like Digital Twin, utilized across logistics, healthcare, automation, manufacturing, and asset management industries. A Digital Twin is a virtual representation of a physical object or process, simulating real-world performance to predict and analyze physical asset functionality, classified into various types with distinct uses. Digital Twin…
  • Dimensionality Reduction Using PCA in Scikit-Learn
    Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this process. PCA is provided in different functions, as described below. Exact PCA PCA reduces dimensionality linearly by applying Singular Value Decomposition (SVD) on centered input data. Scikit-Learn’s PCA module functions as a transformer, learning…
  • Evaluation of Clustering Performance in Scikit-Learn
    Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below. 1. Adjusted Rand Index This algorithm measures similarity between two clusters by counting pairs of samples in similar or different clusters. The following example shows how to use this performance evaluation algorithm. Output 2. Mutual Information Mutual…
  • How Do Clustering Methods Perform in Scikit-Learn?
    Clustering methods in Scikit-Learn are essential for identifying similarities among data samples. As a key unsupervised machine learning technique, they reveal patterns and group similar samples based on features, helping to determine intrinsic groupings within unlabeled data, thus emphasizing their importance in data analysis. This library provides the following clustering methods: The following example applies…
  • Purpose and Types of Boosting Methods in Scikit-Learn
    Boosting methods create an ensemble model incrementally by sequentially training base model estimators. They combine several weak learners trained over multiple iterations to build a powerful ensemble. Two main boosting methods in Scikit-Learn are included in this process: AdaBoost and Gradient Tree Boosting. 1. AdaBoost This method is a successful boosting ensemble method that adjusts…
  • Decision Tree Algorithms in Scikit-Learn
    1. Types of Decision Tree Algorithms Decision tree is a robust non-parametric supervised learning technique for classification and regression. It aims to predict target variable values using decision rules derived from data features. Key components include the root node for data splitting and decision nodes or leaves for final outputs. There are different types of…
  • Classification With Nave Bayes in Scikit-Learn
    Nave Bayes methods are supervised learning algorithms utilizing Bayes theorem, operating under the strong assumption that all predictors are independent. This independence means the presence of one feature does not affect the presence of another within the same class. The following nave Bayes classifiers models are provided: The Nave Bayes classifier can also be utilized…
  • Types of K-Nearest Neighbors (KNN) Algorithms and Learning Techniques in Scikit-Learn
    Neighbor-based learning methods include supervised and unsupervised types. Supervised neighbor-based methods are primarily used for classification, though applicable to regression. These methods do not require a specialized training phase, utilize all available data for training, and do not make assumptions about underlying data, making them lazy and non-parametric. Nearest neighbor methods identify the nearest training…
  • Techniques for Anomaly Detection Process in Scikit-Learn
    Anomaly detection identifies data points that deviate from the norm, classifying anomalies as outliers in three categories: point anomalies are individual data instances deemed anomalous compared to other data, contextual anomalies occur when a data instance is anomalous within a specific context, and collective anomalies arise when related data instances deviate anomalously from the entire…
  • Types of Support Vector Machine (SVM) in Scikit-Learn
    Support vector machine (SVM) is an effective supervised learning method for classification, regression, and outlier detection, particularly in high-dimensional spaces, utilizing a subset of training points for efficiency and memory savings. It aims to classify datasets by identifying a maximum marginal hyperplane (MMH) through two steps: (i) generating optimal hyperplanes to separate classes and (ii)…
  • Stochastic Gradient Descent for Parameter Estimation in Scikit-Learn
    Stochastic Gradient Descent (SGD) is an effective optimization algorithm for estimating coefficients/parameters of functions that minimize a cost function. It is utilized in discriminative learning for linear classifiers like SVM and Logistic regression, making it suitable for large datasets by updating coefficients for each training instance. SGD Classifier The SGD classifier implements a simple SGD…
  • Effectiveness of Extended Linear Modeling in Scikit-Learn
    The effectiveness of extended linear modeling in Scikit-Learn can be studied through polynomial features and pipeline tools, as described below. Polynomial Features Linear models trained on non-linear data maintain fast performance and can fit a broader range, which is why they are preferred in machine learning applications. Simple linear regression can be extended using polynomial…
  • How Does Linear Modeling Work in Scikit-Learn?
    Scikit-Learn offers several linear models, as mentioned below. The following example shows how to use linear regression for the modeling process based on the Real Estate Data Chicago dataset, using only one of the features, and the results exhibit some characteristics, such as the mean squared error. Output The Real Estate Data Chicago dataset (‘real_estate.csv’)…
  • Purpose and Types of Conventions in Scikit-Learn
    Scikit-Learn offers a uniform API with three interfaces: estimator interface for building and fitting the models, predictor interface for making predictions, and transformer interface for converting data. The convention process is performed to ensure that the API is compliant with the following objectives: Scikit-Learn offers various conventions, as mentioned below. Type Casting The following example…
  • The Use of Estimator API in Scikit-Learn
    Estimator API offers a uniform interface for various ML applications, ensuring all Scikit-Learn algorithms utilize it. An estimator learns from data, applicable for classification, regression, clustering, or as a transformer to extract features from raw data. All estimator objects have a fit method for data (obtained from the dataset) fitting as below. Next, all estimator…
  • Data Representation Methods in Scikit-Learn
    Machine learning involves creating models from data, requiring computers to understand that data. Various data representation methods must be considered for effective comprehension. Data as Table In the Scikit-Learn library, data is best represented in tables, where rows denote individual elements and columns indicate quantities related to those elements in a 2-D grid. The following…
  • How is the Modeling Process Done in Scikit-Learn?
    The modeling process in this library can be performed through multiple processes, including dataset loading, dataset splitting, model training, model persistence, and data preprocessing, which are described below. 1. Loading the Dataset Scikit-learn includes example datasets (such as iris and digits) for classification and the Boston house prices for regression. The following example loads the…
  • Introduction to and Installation of the Scikit-Learn Library
    Scikit-learn (formerly scikits.learn and also known as Sklearn) is a powerful library for machine learning in Python. It offers tools for classification, regression, clustering, and dimensionality reduction, using a consistent interface, and is based on NumPy, SciPy, and Matplotlib. This project is a community initiative; contributions are welcome on https://github.com/scikit-learn/scikit-learn/. The latest release can be…
  • Change Date Format in Microsoft Windows
    This video shows how to change the date format in Microsoft Windows. It can be used on different versions of this operating system, such as Windows 10.