Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
therealdavidos
left a comment
There was a problem hiding this comment.
looks good from the math perspective!
There was a problem hiding this comment.
is this change related to the yarn PR?
|
|
||
| self.reset_parameters() | ||
|
|
||
| def _compute_yarn_parameters(self, device: torch.device | None) -> tuple[torch.Tensor, float]: |
There was a problem hiding this comment.
Pleace place private methods below the public interface of the class.
| seq_length_dim: Annotated[int, Field(strict=True)] | ||
| base_freq: Annotated[int, Field(strict=True, ge=10000)] | ||
| max_position_embeddings: Optional[Annotated[int, Field(strict=True, ge=1)]] = None | ||
| rope_scaling: Optional[dict[str, object]] = None |
There was a problem hiding this comment.
Does this play nicely with our config setup? Would it be possible to have something like "rope_scaling: YarnSettings | DefaultSettingsIfExists | SomeFutureRopeScalingSettings | None = None" with the Settings being BaseModels themselves?
| beta_fast_raw = self.rope_scaling.get("beta_fast") | ||
| beta_slow_raw = self.rope_scaling.get("beta_slow") | ||
| beta_fast = float(beta_fast_raw) if isinstance(beta_fast_raw, (int, float)) else 32.0 | ||
| beta_slow = float(beta_slow_raw) if isinstance(beta_slow_raw, (int, float)) else 1.0 |
There was a problem hiding this comment.
I'm a bit worried that in case these parameters are strings or torch types for some reason they will get dropped silently here.
| return 0.1 * mscale * math.log(scale) + 1.0 | ||
|
|
||
| if attention_factor is None: | ||
| if isinstance(mscale, (int, float)) and isinstance(mscale_all_dim, (int, float)): |
There was a problem hiding this comment.
I'm a bit worried that in case these parameters are strings or torch types for some reason they will get dropped silently here.
What does this PR do?
This PR adds YaRN support to rotary position embeddings in the GPT-2 attention path.
General Changes
Breaking Changes
Checklist before submitting final PR
python tests/tests.py)CHANGELOG_DEV.md)