Model Parameters:
- These are internal variables such as weights and biases adjusted during training to minimize errors
- Weights determine the importance of inputs in predicting outputs, while biases act as offsets to fine-tune predictions, helping the model learn patterns and improve accuracy
Loss Function:
- A score of how wrong the model’s predictions are
- Training tries to make this loss smaller
- For example, if a model predicts "spam" with 90% confidence but the email is actually "not spam", the loss function calculates the error
Gradients:
- Gradients measure how much a model's loss changes with small tweaks to inputs or parameters, guiding training or manipulation
- In open-box attacks, attackers use gradients to identify the optimal input modifications to deceive the model
Hard Labels:
- The final answer(class label) the model picks, without showing probabilities
Scores:
- In classification, scores are the raw outputs of a model before converting them into probabilities or labels
- In closed-box attacks, these scores reveal the model's confidence, helping attackers understand its decision-making
Soft Labels:
- Soft Labels are the probability distributions over all classes, indicating the model's confidence in each class
- For example, [0.9, 0.1] might mean 90% confidence in class 0 and 10% in class 1
Decision Boundary:
- An invisible line(or surface) in the input space that separates different class labels
- The attacker aims to "cross" this boundary with minimal changes to the input