FEAT: Drop fastchat from GCG, use tokenizer.apply_chat_template (#965)#1717
Closed
romanlutz wants to merge 26 commits into
Closed
FEAT: Drop fastchat from GCG, use tokenizer.apply_chat_template (#965)#1717romanlutz wants to merge 26 commits into
romanlutz wants to merge 26 commits into
Conversation
Add 26 new unit tests covering: - get_filtered_cands: filtering, clamping, padding behavior - target_loss / control_loss: shape, finiteness, loss ordering - sample_control: shape, vocab bounds, single-position changes, non-ASCII filtering - _build_params: ConfigDict construction from kwargs - _apply_target_augmentation: length preservation, modification, seed reproducibility - _create_attack: transfer flag routing (Progressive vs Individual) - Embedding helpers: error handling for unknown model types - PromptManager init: validation of goals/targets - EvaluateAttack init: worker count validation Total GCG test count: 24 -> 50 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Data & config tests (test_data_and_config.py, 12 tests): - YAML loading: valid files, list values, missing file error - Real config validation: all 11 shipped configs parse, have required keys, individual vs transfer configs have correct settings - get_goals_and_targets: seed reproducibility, different seeds differ, separate test data files, n_train_data limiting - run_trainer validation: unsupported model names, missing HF token Lifecycle tests (test_lifecycle.py, 7 tests): - GPU memory: nvidia-smi parsing (single/multi GPU), MLflow logging, failure handling - generate_suffix lifecycle: MLflow started before training, workers stopped after training, BUG CHARACTERIZATION: workers NOT stopped on failure (leak) Total GCG test count: 24 -> 69 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 10 integration tests that exercise the GCG attack pipeline with a real GPT-2 model on CPU, validating end-to-end correctness: - token_gradients: gradient shape matches (n_control, vocab_size), values are finite and non-zero - GCGAttackPrompt: initializes with valid non-overlapping slices, grad() returns correct shape, test_loss() returns finite positive float - GCGPromptManager.sample_control: sampled candidates are decodable, correct batch size - Embedding helpers: layer/matrix/embeddings work with GPT2LMHeadModel, get_nonascii_toks returns non-empty tensor Uses llama-2 conversation template (has explicit handling in _update_ids). Marked @run_only_if_all_tests (requires RUN_ALL_TESTS=true + torch/transformers). Runs in ~18s on CPU. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These tests only need optional Python packages (torch, transformers, fastchat), not external services or credentials. The importorskip at the top already handles skipping when deps are not installed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move class references to module level to fix N806 (variable naming) - Add noqa: E402 for imports after importorskip guards - Fix ruff format issues - Remove outdated RUN_ALL_TESTS reference in docstring Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove mlflow and azureml-mlflow from GCG dependencies entirely: - Replace mlflow logging in log.py with Python standard logging - Remove mlflow.start_run()/end_run() from train.py and attack_manager.py - Remove mlflow and azureml-mlflow from gcg and all extras in pyproject.toml - Update tests to not mock mlflow - Fix Dockerfile: use nvidia/cuda base + Python 3.11 + uv + pip install -e .[gcg] - Add pyarrow>=22 pin for Python 3.14 compatibility The mlflow dependency caused Azure ML failures due to version incompatibility between mlflow 3.x and azureml-mlflow. Proper experiment logging will be added later via CentralMemory or Azure storage (tracked in plan). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # tests/unit/auxiliary_attacks/gcg/test_lifecycle.py
Remove gbda_deterministic/mpa_deterministic — dead code from GBDA attack that was never consumed by any GCG class. Its presence caused a TypeError in individual mode because MultiPromptAttack.__init__() doesn't accept it. This was a pre-existing bug from the original llm-attacks research repo (silently swallowed by **kwargs there, but our copy removed **kwargs). Also adds scripts/run_gcg_aml.py (launcher with sys.path fix for Azure ML) and scripts/submit_gcg_job.py (job submission reading from .env files). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All mpa_kwargs (deterministic, lr, batch_size, n_steps) were silently absorbed by **kwargs in the original llm-attacks repo's MultiPromptAttack. __init__() but never read. Our copy removed **kwargs, exposing the bug. The original repo even has a typo: 'self.mpa_kewargs' in IndividualPromptAttack (line 1114 of llm-attacks/attack_manager.py). Verified: none of these kwargs are consumed by MultiPromptAttack in either the original repo or our copy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#965) Phi-3-mini hits 'Conversation has no attribute system' in _update_ids() due to fastchat API change. Llama-2 has dedicated handling path that works. GCG baseline VALIDATED on Azure ML: - Model: meta-llama/Llama-2-7b-chat-hf - Config: 5 prompts, 5 steps, batch_size 64 - Result: loss decreases across steps (1.9 -> 0.86 on best prompt) - Runtime: ~6 min on Standard_NC24ads_A100_v4 - Job: silly_vinegar_82x7td6gpn (Completed) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The existing TestCreateAttack tests mock the manager classes, so they never exercise MultiPromptAttack.__init__() with real kwargs. That's why the dead mpa_kwargs bug only surfaced on Azure (TypeError when MPA didn't accept deterministic / lr / etc). This test constructs the real GCG manager classes and verifies IndividualPromptAttack and ProgressiveMultiPromptAttack can create an internal MultiPromptAttack without error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The existing GPT-2 integration tests only use the llama-2 conversation template path. Bugs in the else branch of AttackPrompt._update_ids -- like the Phi-3 conv_template.system AttributeError we hit on Azure -- would never be caught. The two new tests construct GCGAttackPrompt with the vicuna template, which exercises the same code path. They are marked xfail (strict=True) because vicuna's fastchat conversation template lacks a .system attribute, reproducing the same bug. The xfail marker references issue microsoft#965 and will flip to 'unexpectedly passed' when the fastchat replacement lands, prompting removal of the marker. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Updates the AML notebook to reflect the actual flow we ran during Phase 1c baseline validation: llama-2 baseline (phi-3 has fastchat microsoft#965 bug), run_gcg_aml.py launcher script (so the uploaded code snapshot wins over the Docker-installed package), repo-root build context (Dockerfile needs to COPY pyproject.toml + pyrit/ for pip install -e .[gcg]), and PyRIT-style env file loading via _load_environment_files. Adds tests/end_to_end/auxiliary_attacks/test_gcg_aml_e2e.py mirroring that same flow as a real e2e test. Submits a small (5-step, 5-train, batch 64) llama-2 GCG job, polls until terminal state, asserts Completed. Skipped unless RUN_ALL_TESTS=true and AZURE_ML_* + HUGGINGFACE_TOKEN env vars are set (since it submits real paid Azure ML compute). Always cancels the submitted job on test failure or interruption to avoid leaking compute. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The scripts/ directory is not packaged for PyPI installs, so the AML launcher there was inaccessible to anyone who pip-installed pyrit[gcg]. Move the entry-point cwd handling into pyrit/auxiliary_attacks/gcg/ experiments/run.py itself: when run as `__main__`, chdir into the file's own directory so the relative `configs/` and `results/` paths resolve regardless of where the script is invoked from. AML jobs (notebook and e2e test) now run python -m pyrit.auxiliary_attacks.gcg.experiments.run --model_name ... which also makes the previous sys.path hack unnecessary -- `python -m` puts cwd at the front of sys.path, so the uploaded code snapshot still wins over the Docker-installed package. Deletes scripts/run_gcg_aml.py and scripts/submit_gcg_job.py (the latter was a CLI duplicate of the notebook's submission flow). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pyarrow>=22.0.0; python_version >= '3.14' pin was added in c98af28 to the core dependencies, but pyrit core does not actually need it -- without the gcg extra, the resolver picks a 3.14-compatible pyarrow on its own via the transitive datasets -> pyarrow chain. The pin is only needed when the gcg extra is installed because something in that extra constrains the resolution toward an older pyarrow that lacks cp314 wheels and fails to build from source on Python 3.14. Moves the pin to the gcg extra and adds an inline comment explaining why it is there, matching the existing precedent for the spacy cp314 wheel comment in the all extra. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The AML e2e test previously rebuilt the MLClient, environment, and command from scratch -- a near-copy of the notebook's submission flow. Replace that with `runpy.run_path()` of `doc/code/auxiliary_attacks/1_gcg_azure_ml.py`. The notebook is jupytext percent format (the `# %%` markers are plain comments) so the file is valid Python and runs as a script. The test then pulls `returned_job` and `ml_client` out of the executed namespace and polls the job to a terminal state. Result: the notebook is the single source of truth for the submission flow, and the test verifies that what we ask users to run actually works end-to-end. Net diff is -27 lines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ran jupytext --to ipynb --execute against the notebook .py to capture cell
outputs (workspace name, environment build status, submitted job name +
status + Studio URL) per PyRIT convention. The submitted job
('lucid_muscle_nt947p71s0') ran to completion on Azure ML, doubling as a
verification that the refactored notebook (which now invokes the GCG runner
via 'python -m pyrit.auxiliary_attacks.gcg.experiments.run' instead of the
old scripts/ launcher) still works end-to-end.
The captured stderr cells include some Azure ML SDK telemetry noise
('ActivityCompleted: ... HowEnded=Failure' for benign UserError conditions
like 'environment already at version N'). Will be cleaned up in a follow-up
by suppressing the azure.ai.ml._telemetry logger in the notebook source.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds two pieces to make the AML tutorial actually show users an end-to-end
result instead of stopping at "Job submitted, monitor in Studio":
1. ``run.py`` now accepts ``--output_dir``. The runner writes its result
JSON under whatever path the caller passes, defaulting to ``outputs/``.
The notebook's command declares ``outputs={"results": Output(uri_folder)}``
and passes ``--output_dir ${{outputs.results}}`` so AML mounts a path,
the runner writes there, and the contents are uploaded as a named output
artifact (auto-capture of ``./outputs/`` is *not* available in SDK v2
command jobs -- you have to declare named outputs explicitly).
2. A new poll-and-inspect cell at the end polls the submitted job, then
downloads with ``all=True``, finds the result JSON under
``<download_dir>/named-outputs/results/``, and prints the final loss
and generated adversarial suffix.
Also adds a (best-effort) logging suppression block early in the notebook
for azure.ai.ml SDK telemetry. It catches the python-logging warnings but
not the "ActivityCompleted: HowEnded=Failure" lines or the upload progress
bars -- those go through the SDK's own stderr handler with
propagate=False and are not reachable via standard logging config (see
azure-ai-ml _utils/_logger_utils.py). The remaining noise is benign
telemetry for expected UserError conditions like "environment already at
this version".
Notebook re-executed end-to-end against AML (job stoic_parcel_6clfs67hp9,
llama-2, 5 train data, 5 steps): completed successfully, suffix downloaded
and printed.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…noise The 1.32.0 release includes: Skip _list_secrets for identity-based datastores to prevent noisy telemetry traces. That bullet is exactly the source of the ``ActivityCompleted: Activity=Datastore.ListSecrets, HowEnded=Failure ... UserError ... No secrets for credentials of type None`` blob that was showing up in our Azure ML notebook's executed cell outputs and made it look like the env build / job submission was failing when it actually wasn't. After bumping, a quick smoke (build MLClient, list envs) drops from many lines of telemetry noise to a single ``Class X is experimental`` info message -- much more reasonable for a tutorial. Bumped both the ``gcg`` extra and the ``all`` extra so they stay aligned. The upload progress bars and the experimental-class warning still show up; those are separate noise sources that this SDK release does not address. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # uv.lock
CI's coverage check (>=90% on diff) was failing on the log_gpu_memory try/except added in c98af28: lines 70-74 of log.py weren't reached. Two issues: 1. The TestGpuMemoryLogging class lived in test_lifecycle.py, which does pytest.importorskip on the GCG train module. CI installs the 'all' extra but not 'gcg', so ml_collections is missing and the train import fails, skipping the whole test_lifecycle.py module -- including the GPU memory tests, even though they only need stdlib. Moved them into test_log.py (which only importorskips the log module, all stdlib) so they actually run in CI. 2. The new test_log_gpu_memory_swallows_nvidia_smi_failure exercises the except branch (lines 73-74) that the old success-only test never hit. log_gpu_memory must swallow nvidia-smi failures so the training loop never crashes when run on a host without nvidia-smi. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…osoft#965) GCG previously relied on the (long-unmaintained) `fastchat` library to render the user/assistant exchange for each AttackPrompt. That library bundled a hardcoded list of ~5 model templates and broke whenever a new model shipped without a fastchat entry — most recently Phi-3-mini, whose template lacks `.system` and crashed `_update_ids` on Azure ML. This change replaces the fastchat-driven slice computation with `tokenizer.apply_chat_template()`, which is the standard HuggingFace path and works for any chat-tuned model whose tokenizer ships a Jinja chat template. It also drops `fastchat`/`fschat` as a runtime dependency entirely (removed from the GCG Dockerfile). The work is based on PR microsoft#1049 by @varshini2305, which prototyped the same approach. This commit polishes that prototype: - Removes WIP `print(...)` debug statements from `_update_ids`. - Replaces the narrow `_detect_assistant_role` heuristic (only matched `<|assistant|>` / `assistant:`) with positional computation: the assistant role tokens are always whatever sits between the end of the control and the start of the target. This works for llama-2's `[/INST]`, llama-3's header markers, phi-3 ChatML, and any future template without code changes. - Handles the edge case where `char_to_token` returns None (when a substring ends exactly at the prompt boundary, common for the target string). - Restores `**params.tokenizer_kwargs[i]` spread in `get_workers` (the prototype hardcoded `use_fast=True` and dropped user kwargs). - Drops `conv_template` from `AttackPrompt`, `PromptManager`, `MultiPromptAttack`, `EvaluateAttack`, `IndividualPromptAttack`, `ProgressiveMultiPromptAttack`, `ModelWorker`, `get_workers`, and the log-file dicts they populate. - Drops `conversation_templates` from `generate_suffix`, `_build_params`, every shipped YAML config, and the config-validation test. - Adds a clear `ValueError` in `get_workers` if a tokenizer has no chat template configured. Tests: - All 72 GCG unit tests pass (existing tests adjusted for new signatures). - The 12 GCG integration tests pass on real GPT-2 with both a llama-2-style and a ChatML/phi-3-style chat template — the second template shape was previously `xfail(strict=True, raises=AttributeError)` referencing this exact issue. Those xfail markers are removed. - Broader pyrit unit suite (7,492 tests) unaffected. Closes microsoft#965. Builds on microsoft#1049 by @varshini2305. Co-authored-by: Roman Lutz <romanlutz13@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds an `individual_phi_4.yaml` config and registers `phi_4` in the GCG runner's known model list. Validated end-to-end on Azure ML alongside the apply_chat_template fastchat removal: a 5-step run on `microsoft/phi-4` completed successfully with `python -m pyrit.auxiliary_attacks.gcg.experiments.run --model_name phi_4 --setup single --n_train_data 5 --n_test_data 0 --n_steps 5 --batch_size 64`, producing a finite loss and non-trivial suffix. Note `tokenizer_kwargs` uses `use_fast: True` (Phi-4 ships with a fast tokenizer; `use_fast: False` is reserved for older models that need it, e.g. llama-2). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds unit tests exercising the four diff-coverage gaps flagged by CI on PR microsoft#1717: - AttackPrompt._update_ids() ValueError when goal/control/target don't appear verbatim in the chat-templated prompt (line 186). - start_tok() walking forward when char_to_token returns None at the initial position, then finding a mappable position later (line 210). - start_tok() falling back to len(toks) when no position from char_pos to end-of-prompt maps to a token (line 211). - get_workers() ValueError when a tokenizer has no chat_template configured (lines 1620-1621). Both start_tok tests use a fully mocked tokenizer because real tokenizers are too well-behaved (every byte position maps to some token) to deterministically exercise the None-handling branches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
Superseded by #1049: I force-pushed the polished work to varshini2305/feature/replace_fastchat so the original PR by @varshini2305 can be the one that gets merged. All commits, tests, and validation context have moved to #1049. Closing this in favor of that one. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drops the
fastchat(a.k.a.fschat) runtime dependency from GCG and replaces itsper-model
Conversationmachinery withtokenizer.apply_chat_template(), the standardHuggingFace path that works for any chat-tuned model whose tokenizer ships a Jinja
chat template. Closes #965.
This work is based on PR #1049 by @varshini2305, which prototyped the same approach.
The single commit on this branch is authored by @varshini2305 to preserve credit;
this description covers what got polished from her prototype to make it landable.
Why
fastchatis an unmaintained library (last release Aug 2024) that bundled a hardcodedlist of ~5 model templates and broke whenever a new model shipped without a fastchat
entry. Most recently we hit this on Azure ML with Phi-3-mini, whose template lacks a
.systemattribute and crashedAttackPrompt._update_idswithAttributeError: 'Conversation' object has no attribute 'system'. PR #1705 capturedthe failure mode as
xfail(strict=True, raises=AttributeError)Vicuna integrationtests; this PR makes those tests pass and removes the markers.
The replacement (
tokenizer.apply_chat_template) automatically supports anychat-tuned model and removes our need to know template internals — addressing
#990 (more models for suffix
evaluation) as a side effect of the architecture change.
Polish on top of varshini2305/feature/replace_fastchat
The PR #1049 prototype had several rough edges that needed cleanup before landing:
print(...)debug statements sprinkled through_update_ids._detect_assistant_role()heuristic (only matched<|assistant|>/assistant:) with a positional computation:_assistant_role_sliceis alwayswhatever sits between the end of the control and the start of the target. Works for
llama-2's
[INST]/[/INST], llama-3 header markers, phi-3 ChatML, and any futuretemplate without code changes.
char_to_tokenreturning None when a substring ends exactly at theprompt boundary (common for the target string at end-of-prompt). The prototype
hit
TypeError: NoneType - int; new code clamps end positions tolen(toks)andwalks forward for start positions.
**params.tokenizer_kwargs[i]spread inget_workers(the prototypehardcoded
use_fast=Trueand dropped user-provided kwargs).ValueErroringet_workersif a tokenizer has no chat templateconfigured, instead of crashing later inside
_update_ids.Scope
Surgically dropped
conv_templatefrom every API surface that referenced it:AttackPrompt,PromptManager,MultiPromptAttack,EvaluateAttack,IndividualPromptAttack,ProgressiveMultiPromptAttack,ModelWorker,get_workers— all lose the parameter.MultiPromptAttack.test_alland the log-file dicts populated byProgressive/Individual/EvaluateAttackupdated accordingly. The log-fileper-worker entries now record
chat_templateinstead ofconv_template.name.train.py: drops theconversation_templatesparameter (and updates_build_paramscall).pyrit/auxiliary_attacks/gcg/experiments/configs/: dropthe
conversation_templatesfield.pyrit/auxiliary_attacks/gcg/src/Dockerfile: drops theuv pip install fschat @ git+...line.test_data_and_config.pydrops
conversation_templatesfrom its required-keys check.Testing
Local
tests/unit/auxiliary_attacks/gcg/).tests/integration/auxiliary_attacks/) — exercises_update_idsend-to-end on real GPT-2 with two distinct chat-template shapes:[INST]/[/INST]inline markers).<|user|>/<|assistant|>distinct role tokens) — thistemplate shape was the
xfail(strict=True, raises=AttributeError)case in PR MAINT: GCG in AzureML fix & improved test coverage, remove mlflow #1705referencing this exact issue. It now passes.
Azure ML (real models, real GPUs)
Submitted GCG jobs across multiple chat-tuned model families on
gcg-gpu-a100to validate the change. Run command for each was the same shape as the notebook
(
python -m pyrit.auxiliary_attacks.gcg.experiments.run --model_name {NAME} --setup single --n_train_data 5 --n_test_data 0 --n_steps 5 --batch_size 64 --output_dir ${{outputs.results}}).meta-llama/Llama-2-7b-chat-hfmeta-llama/Meta-Llama-3-8B-Instructmicrosoft/Phi-3-mini-4k-instructmicrosoft/phi-4individual_phi_4.yamlconfig in a follow-up commit on this branch.lmsys/vicuna-13b-v1.5tiktokenruntime dep issue: vicuna's tokenizer files needpip install tiktokento convert. Fails inAutoTokenizer.from_pretrainedbefore our code runs. Tracked as follow-up.Qwen/Qwen3.6-27B,google/gemma-4-e4b-ituse_cachein__init__; the YAML configs passmodel_kwargs: [{"use_cache": False}]which older transformers happily forwarded. Pre-existing config issue, not a fastchat regression.zai-org/GLM-4.7-Flash,openai/gpt-oss-20bapply_chat_templateworked, but failed inget_embedding_matrixwithUnknown model type— that helper hardcodesisinstancechecks against a fixed list of model classes (Llama, Mistral, Phi-3, GPT-2, etc.). Replacing with a genericmodel.get_input_embeddings()fallback (what nanoGCG does) is on the GCG roadmap.Net validation: 5 successful AML runs across 4 distinct chat-tuned families
(llama-2, llama-3, phi-3, phi-4) using
tokenizer.apply_chat_template().Fastchat removal works.
Backward compatibility
AttackPromptand friends dropconv_templatefrom their public signatures. Thisis technically breaking, but every caller in
pyrit/is updated in the same commitand there are no documented external integrations of these classes. The pre-existing
generate_suffix(...)entry point (and the YAML configs that drive it) is theintended public API; it now silently ignores any
conversation_templateskwargcallers might pass (the parameter is removed, so passing it raises
TypeError).Note on base branch
This branch is currently based on
gcg-refactor(PR #1705) since that PR is inreview. Once #1705 merges to
main, this PR's base will be switched and the diffwill collapse to just the fastchat removal.
Closes / references
863c4906is included on this branch (small additive change)Out-of-scope follow-ups discovered during validation
These are not in this PR; flagged for future work:
get_embedding_matrixis hardcoded to specific model classes. GLM-4.7-Flash,gpt-oss-20b, and presumably most modern HF chat models hit
ValueError: Unknown model type. Drop-in fix: fall back tomodel.get_input_embeddings(). Same pattern nanoGCG uses.model.forward, the main process hangs indefinitely waiting on the joinablequeue, and AML reports the job as "Running" forever. Worth a separate fix.
use_cache: Falsein the shipped YAML configs is rejected by newertransformers model
__init__for some model families. Should drop it frommodel_kwargs(it's already the default for inference-time generationirrelevant to GCG anyway).
tiktokenis a missing runtime dep for vicuna-13b-v1.5 (and presumablyother models whose tokenizers ship as tiktoken files). Add to gcg extra.