ML, DL, and GPT

Home PDF Audio

Machine Learning (ML) is a field of computer science that enables systems to learn from data and improve their performance without explicit programming.
Deep Learning (DL) is a subfield of ML that utilizes multi-layered neural networks to model complex patterns in data.
Neural Networks are computational models inspired by the human brain, composed of interconnected nodes (neurons) that process information in layers.
Training Data is the labeled or unlabeled dataset used to teach a machine learning model how to perform a task.
Supervised Learning involves training a model on labeled data, where each example has an input and an associated correct output.
Unsupervised Learning uses unlabeled data, allowing the model to discover hidden patterns or groupings without explicit instruction.
Reinforcement Learning (RL) trains agents to make decisions by rewarding desired behaviors and penalizing undesirable ones.
Generative Models learn to produce new data similar to their training examples (e.g., text, images).
Discriminative Models focus on classifying inputs into categories or predicting specific outcomes.
Transfer Learning allows a model trained on one task to be reused or fine-tuned on a related task.
GPT (Generative Pre-trained Transformer) is a family of large language models developed by OpenAI that can generate human-like text.
ChatGPT is an interactive variant of GPT, fine-tuned for conversation and instruction-following tasks.
Transformer Architecture was introduced in the paper “Attention Is All You Need,” revolutionizing natural language processing by relying on attention mechanisms.
Self-Attention mechanisms let the model weigh different parts of the input sequence when constructing an output representation.
Positional Encoding in Transformers helps the model identify the order of tokens in a sequence.
Pre-training is the initial phase where a model learns general features from large-scale data before being fine-tuned on specific tasks.
Fine-tuning is the process of taking a pre-trained model and adapting it to a narrower task using a smaller, task-specific dataset.
Language Modeling is the task of predicting the next token (word or subword) in a sequence, foundational to GPT-like models.
Zero-shot Learning allows a model to handle tasks without explicit training examples, relying on learned general knowledge.
Few-shot Learning leverages a limited number of task-specific examples to guide model predictions or behaviors.
RLHF (Reinforcement Learning from Human Feedback) is used to align model outputs with human preferences and values.
Human Feedback can include rankings or labels that guide the model’s generation toward more desired responses.
Prompt Engineering is the art of crafting input queries or instructions to guide large language models effectively.
Context Window refers to the maximum amount of text the model can process at once; GPT models have a limited context length.
Inference is the stage where a trained model makes predictions or generates outputs given new inputs.
Parameter Count is a key factor in model capacity; larger models can capture more complex patterns but require more computation.
Model Compression techniques (e.g., pruning, quantization) reduce a model’s size and speed up inference with minimal accuracy loss.
Attention Heads in Transformers process different aspects of the input in parallel, improving representational power.
Masked Language Modeling (e.g., in BERT) involves predicting missing tokens in a sentence, helping the model learn context.
Causal Language Modeling (e.g., in GPT) involves predicting the next token based on all previous tokens.
Encoder-Decoder Architecture (e.g., T5) uses one network to encode the input and another to decode it into a target sequence.
Convolutional Neural Networks (CNNs) excel at processing grid-like data (e.g., images) via convolutional layers.
Recurrent Neural Networks (RNNs) process sequential data by passing hidden states along time steps, though they can struggle with long-term dependencies.
Long Short-Term Memory (LSTM) and GRU are RNN variants designed to better capture long-range dependencies.
Batch Normalization helps stabilize training by normalizing intermediate layer outputs.
Dropout is a regularization technique that randomly “drops” neurons during training to prevent overfitting.
Optimizer Algorithms like Stochastic Gradient Descent (SGD), Adam, and RMSProp update model parameters based on gradients.
Learning Rate is a hyperparameter that determines how drastically weights are updated during training.
Hyperparameters (e.g., batch size, number of layers) are configuration settings chosen before training to control how learning unfolds.
Model Overfitting occurs when a model learns training data too well, failing to generalize to new data.
Regularization Techniques (e.g., L2 weight decay, dropout) help reduce overfitting and improve generalization.
Validation Set is used to tune hyperparameters, while the Test Set evaluates the final performance of the model.
Cross-validation splits data into multiple subsets, systematically training and validating to get a more robust performance estimate.
Gradient Exploding and Vanishing problems occur in deep networks, making training unstable or ineffective.
Residual Connections (skip connections) in networks like ResNet help mitigate vanishing gradients by shortcutting data paths.
Scaling Laws suggest that increasing model size and data generally leads to better performance.
Compute Efficiency is critical; training large models requires optimized hardware (GPUs, TPUs) and algorithms.
Ethical Considerations include bias, fairness, and potential harm—ML models must be carefully tested and monitored.
Data Augmentation artificially expands training datasets to improve model robustness (especially in image and speech tasks).
Data Preprocessing (e.g., tokenization, normalization) is essential for effective model training.
Tokenization splits text into tokens (words or subwords), the fundamental units processed by language models.
Vector Embeddings represent tokens or concepts as numerical vectors, preserving semantic relationships.
Positional Embeddings add information about the position of each token to help a Transformer understand sequence order.
Attention Weights reveal how a model distributes focus across different parts of the input.
Beam Search is a decoding strategy in language models that keeps multiple candidate outputs at each step to find the best overall sequence.
Greedy Search picks the most probable token at each step, but can lead to suboptimal final outputs.
Temperature in sampling adjusts the creativity of language generation: higher temperature = more randomness.
Top-k and Top-p (Nucleus) sampling methods restrict the candidate tokens to the k most likely or a cumulative probability p, balancing diversity and coherence.
Perplexity measures how well a probability model predicts a sample; lower perplexity indicates better predictive performance.
Precision and Recall are metrics for classification tasks, focusing on correctness and completeness, respectively.
F1 Score is the harmonic mean of precision and recall, balancing both metrics into a single value.
Accuracy is the fraction of correct predictions, but it can be misleading in imbalanced datasets.
Area Under the ROC Curve (AUC) measures a classifier’s performance across various thresholds.
Confusion Matrix shows the counts of true positives, false positives, false negatives, and true negatives.
Uncertainty Estimation methods (e.g., Monte Carlo Dropout) gauge how confident a model is in its predictions.
Active Learning involves querying new data examples that the model is least confident about, improving data efficiency.
Online Learning updates the model incrementally as new data arrives, rather than retraining from scratch.
Evolutionary Algorithms and Genetic Algorithms optimize models or hyperparameters using bio-inspired mutation and selection.
Bayesian Methods incorporate prior knowledge and update beliefs with incoming data, useful for uncertainty quantification.
Ensemble Methods (e.g., Random Forest, Gradient Boosting) combine multiple models to improve performance and stability.
Bagging (Bootstrap Aggregating) trains multiple models on different subsets of the data, then averages their predictions.
Boosting iteratively trains new models to correct errors made by previously trained models.
Gradient Boosted Decision Trees (GBDTs) are powerful for structured data, often outperforming simple neural networks.
Autoregressive Models predict the next value (or token) based on previous outputs in a sequence.
Autoencoder is a neural network designed to encode data into a latent representation and then decode it back, learning compressed data representations.
Variational Autoencoder (VAE) introduces a probabilistic twist to generate new data that resembles the training set.
Generative Adversarial Network (GAN) pits a generator against a discriminator, producing realistic images, text, or other data.
Self-Supervised Learning leverages large amounts of unlabeled data by creating artificial training tasks (e.g., predicting missing parts).
Foundation Models are large pre-trained models that can be adapted to a wide range of downstream tasks.
Multimodal Learning integrates data from multiple sources (e.g., text, images, audio) to create richer representations.
Data Labeling is often the most time-consuming part of ML, requiring careful annotation for accuracy.
Edge Computing brings ML inference closer to the data source, reducing latency and bandwidth usage.
Federated Learning trains models across decentralized devices or servers holding local data samples, without exchanging them.
Privacy-Preserving ML includes techniques like differential privacy and homomorphic encryption to protect sensitive data.
Explainable AI (XAI) aims to make the decisions of complex models more interpretable to humans.
Bias and Fairness in ML need careful oversight, as models can inadvertently learn and amplify societal biases.
Concept Drift occurs when the statistical properties of the target variable change over time, impacting model performance.
AB Testing compares two or more versions of a model to see which performs better in a real-world environment.
GPU Acceleration exploits parallel computing on graphics cards to drastically speed up ML training.
TPUs (Tensor Processing Units) are specialized hardware accelerators by Google for efficient deep learning workloads.
Open-Source Frameworks (e.g., TensorFlow, PyTorch) provide building blocks and tools for ML model development.
Model Serving is the practice of deploying trained models so they can handle real-time or batch predictions.
Scalability is crucial for handling large datasets or heavy traffic, requiring distributed training and inference strategies.
MLOps combines ML development with operations practices, focusing on reproducibility, testing, and continuous integration.
Version Control for data and models ensures consistent experiment tracking and collaboration.
Deployment Strategies (e.g., containers, microservices) organize how models are packaged and served at scale.
Monitoring tracks model performance post-deployment, watching for degradations or anomalies.
Retraining and Model Updates keep models current as new data and changing conditions arise.
Time Complexity (O-notation) measures how an algorithm’s runtime scales with input size; O(1) denotes constant time.
Future of ML promises increasingly sophisticated and general models, but must address ethical, social, and environmental considerations.

Back Donate