Design a language detection system
Design a machine learning system that automatically detects and tags the language of user posts when they are created, handling multimodal content, mixed languages, and different scripts.
Asked at:
Meta
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Mid January, 2026
Meta
Manager
Design a ML system that creates a tag with the predicted language when a user creates a post. Questions went into details like: 1. Is the post multimodal or just text ? 2. Is the post written in the native script or latin script (like Russian written in latin v/s Cyrillic) 3. Is it possible that a post has more than one language/script (some portion of the post in Hindi and some in English) ? 4. Is there a fix set of output labels (languages) to choose from ? 5. Possible slangs or variants of a language ? 5. Can the user challenge the label and edit it in the case the user disagrees with the predicted label (Reinforcement) ? 6. Can the user engage in adversarial action like deliberately selecting a wrong label to confuse the algorithm after we have made an initial prediction ? 7. Use of comments to supplement training (Comments could be in a different language and script). 8. Ways to tokenize different script (Had no answer to this one) And much much more.
Your account is free and you can post anonymously if you choose.