Linguist
Description
Identify the breakdown of programming languages used in a GitHub repository, or the anticipated language of an individual file.
The most well-known output of linguist's process is the language breakdown graph shown on a repository on GitHub documenting the percentage of languages used along with a small chart visualizing that breakdown.
Unlike utilities like DROID, based on PRONOM, which rely on pattern matching to identify, in this case programming language, and thus, file format, GitHub's linguist is deterministic and uses a series of decision making strategies to determine the potential file format.
Algorithm
Linguist uses a number of strategies to reduce the amount of inputs it is dealing with including excluding binary objects and those identified as data before applying a series of identification strategies to try and determine the programming language used across a GitHub repository or an individual file.
Its algorithm is described in more detail on GitHub.
Among its strategies, linguist lists the following checks:
- Vim or Emacs modeline,
- commonly used filename,
- shell shebang,
- file extension,
- XML header,
- man page section,
- heuristics,
- naïve Bayesian classification
User Experiences
- How to add a programming language to GitHub Jake Lee on Software.
- Linguist Discussions on GitHub.
Development Activity
All development activity is visible on GitHub: https://github.com/github-linguist/linguist/commits
Release Feed
Below the last 3 release feeds:
- 2026-03-18 15:09:57
- [tag:github.com,2008:Repository/1725199/v9.5.0 v9.5.0]
- by lildude
- 2026-01-21 10:42:09
- [tag:github.com,2008:Repository/1725199/v9.4.0 v9.4.0]
- by lildude
- 2025-09-18 09:55:02
- [tag:github.com,2008:Repository/1725199/v9.3.0 v9.3.0]
- by lildude
Activity Feed
Below the last 5 commits:
- 2026-06-06 07:45:40
- [tag:github.com,2008:Grit::Commit/6a1b5b43789fa0a49917211ffa21e8ce1c8a9883 Add Power Query (M) language support (#7896)]
- by jacob-kraniak https://github.com/jacob-kraniak
- 2026-06-04 16:06:51
- [tag:github.com,2008:Grit::Commit/f93105465340be085671379dfbfdf5ee98fad04f Add support for Valve Map Format (`.vmf`) files (#7985)]
- by meowcat767 https://github.com/meowcat767
- 2026-06-04 10:08:30
- [tag:github.com,2008:Grit::Commit/f2e3a6591725c78bc5895bf22945348d57d5588b Add Redscript language (#5809)]
- by abheekda1 https://github.com/abheekda1
- 2026-06-04 09:38:30
- [tag:github.com,2008:Grit::Commit/1b0eab352970bbfd036d01bb0ad825c126dff4a0 Add `bun` and `deno` to interpreters in languages.yml for `JavaScript…]
- by oopsio https://github.com/oopsio
- 2026-06-04 09:30:48
- [tag:github.com,2008:Grit::Commit/0906dd7e44befe461ba346b1d13576fa1bbca80e Update Mojo grammar source (#7867)]
- by jackos https://github.com/jackos
References
- How Linguist Works on GitHub.
See also
- CLOC (Count Lines of Code) on COPTR.