Skip to content

Pr/multilingual#121

Draft
raghavm243512 wants to merge 5 commits into
mainfrom
pr/multilingual
Draft

Pr/multilingual#121
raghavm243512 wants to merge 5 commits into
mainfrom
pr/multilingual

Conversation

@raghavm243512
Copy link
Copy Markdown
Collaborator

@raghavm243512 raghavm243512 commented May 18, 2026

initial multilingual version

Easily extendable to many language using the add_culture_data script. This will do translation, gender consistent naming, suggest names, extend data, etc. So if anyone wants to run a language not committed in EVA data, it is trivially easy to do so
Readme section showing basic of adding a language.
Tested end to end once, but I don't have 11lab login and ran out of credits on free one (needed a custom agent for foreign lang)

Still TODO:
Currencies
Actually committing the translations (didn't want to burn credits until finalized)
Analysis
Testing a large variety of models to ensure they actually get the language code they expected (es-MX vs es, for example)

Copy link
Copy Markdown
Collaborator

@katstankiewicz katstankiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add ensure_ascii=False to AuditLog save()

Comment thread src/eva/utils/culture.py
def get_initial_message(language: str) -> str:
"""Return the assistant's opening line for ``language``.

Falls back to English. Raises if even English is missing (data quality).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Falls back to English. Raises if even English is missing (data quality).
Falls back to English. Raises if even English is missing.

Comment thread scripts/run_text_only.py
user_message = resolve_user_goal(
record.user_goal,
record.culture_overrides,
os.getenv("EVA_LANGUAGE", "en"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.getenv("EVA_LANGUAGE", "en"),
language,

Comment thread scripts/run_text_only.py
goal = resolve_user_goal(
record.user_goal,
record.culture_overrides,
os.getenv("EVA_LANGUAGE", "en"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.getenv("EVA_LANGUAGE", "en"),
language,

and add language to build_user_sim_prompt

),
audit_log=audit_log,
api_key=params["api_key"],
base_url=params.get("url", ""),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base_url=params.get("url", ""),

we don't need a url for openai services

from eva.utils.culture import FIRST_NAME_PLACEHOLDER, LAST_NAME_PLACEHOLDER
from eva.utils.json_utils import extract_and_load_json
from eva.utils.llm_client import LLMClient
from eva.utils.logging import get_logger
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from eva.utils.logging import get_logger
from eva.utils.logging import get_logger, setup_logging

from eva.utils.logging import get_logger
from eva.utils.router import init

logger = get_logger(__name__)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds logging (previously was only printing warning level or above)

Suggested change
logger = get_logger(__name__)
setup_logging()
logger = get_logger(__name__)


# 3. Write back atomically.
if dry_run:
logger.info(f"[dry-run] would update {dataset_path} ({len(records)} records)")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(f"[dry-run] would update {dataset_path} ({len(records)} records)")
logger.info(f"[dry-run] would update {dataset_path} ({len(target_ids)} records)")

def _load_names_file(path: Path) -> dict[str, list[str]]:
data = json.loads(path.read_text(encoding="utf-8"))
for key in ("male_first", "female_first", "last"):
if not isinstance(data.get(key), list) or not data[key]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this check that there are enough names as well? ie BUCKET_SIZE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants