Skip to content

Commit 46e444c

Browse files
change the detokenization thread to return the actual eos token. (#108)
* change the detokenization thread to return the actual eos token.
1 parent c3fe3ce commit 46e444c

2 files changed

Lines changed: 2 additions & 1 deletion

File tree

jetstream/engine/mock_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ def _encode(self, s: str) -> Sequence[int]:
6363

6464
def _decode(self, ids: np.ndarray):
6565
"""Converts a numpy array into a string."""
66-
return "".join([chr(r) for r in list(ids)])
66+
return "".join([chr(r) for r in list(ids) if r not in self.stop_tokens])
6767

6868
def _encode_tf(self, s: str) -> np.ndarray:
6969
"""Converts a string into a numpy array."""

jetstream/engine/token_utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ def process_result_tokens(
214214
)
215215
if tok_id in stop_tokens or not valid:
216216
complete[idx] = True
217+
tok_id_so_far.append(tok_id)
217218
break
218219
else:
219220
if not is_client_side_tokenization:

0 commit comments

Comments
 (0)