Linguist

From COPTR
Revision as of 12:16, 30 March 2026 by Ross-spencer (talk | contribs) (Initial entry)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Identify the breakdown of programming languages used in a GitHub repository, or the anticipated language of an individual file
Homepage:https://github.com/github-linguist/linguist/tree/main
Status: Maintained ✅
Source Code:https://github.com/github-linguist/linguist/tree/main
License:MIT
Cost:Free and Open Source (FOSS)
Language:Ruby
Function:Content Profiling,File Format Identification

Description

Identify the breakdown of programming languages used in a GitHub repository, or the anticipated language of an individual file.

The most well-known output of linguist's process is the language breakdown graph shown on a repository on GitHub documenting the percentage of languages used along with a small chart visualizing that breakdown.

Unlike utilities like DROID, based on PRONOM, which rely on pattern matching to identify, in this case programming language, and thus, file format, GitHub's linguist is deterministic and uses a series of decision making strategies to determine the potential file format.

Algorithm

Linguist uses a number of strategies to reduce the amount of inputs it is dealing with including excluding binary objects and those identified as data before applying a series of identification strategies to try and determine the programming language used across a GitHub repository or an individual file.

Its algorithm is described in more detail on GitHub.

Among its strategies, linguist lists the following checks:

  • Vim or Emacs modeline,
  • commonly used filename,
  • shell shebang,
  • file extension,
  • XML header,
  • man page section,
  • heuristics,
  • naïve Bayesian classification

User Experiences

Development Activity

All development activity is visible on GitHub: https://github.com/github-linguist/linguist/commits

Release Feed

Below the last 3 release feeds:

2026-03-18 15:09:57
[tag:github.com,2008:Repository/1725199/v9.5.0 v9.5.0]
by lildude
2026-01-21 10:42:09
[tag:github.com,2008:Repository/1725199/v9.4.0 v9.4.0]
by lildude
2025-09-18 09:55:02
[tag:github.com,2008:Repository/1725199/v9.3.0 v9.3.0]
by lildude

Activity Feed

Below the last 5 commits:

2026-03-18 15:05:26
[tag:github.com,2008:Grit::Commit/537297cdae3ab05f8d5dd1c03627a5bd73707b19 Release v9.5.0 (#7858)]
by lildude https://github.com/lildude
2026-03-16 16:55:03
[tag:github.com,2008:Grit::Commit/cb756ae9b3242a3a5fe2a5934dcfd8c518cf09d5 Add .gitattributes override mention when returning the strategy (#7600)]
by DecimalTurn https://github.com/DecimalTurn
2026-03-16 16:28:46
[tag:github.com,2008:Grit::Commit/e5e38c00f506d977a7482e041a13b3b46d610403 Add XVBA dependencies as vendored (#7532)]
by DecimalTurn https://github.com/DecimalTurn
2026-03-16 16:11:18
[tag:github.com,2008:Grit::Commit/240bf9233bbfe82209e9c5437ab8ec06ba648c91 Adding `txtpb` extension to Protocol Buffer Text Format (#6566)]
by milesflo https://github.com/milesflo
2026-03-16 15:57:12
[tag:github.com,2008:Grit::Commit/15497971e89ca5a7a5bfd30dcdf037c1540705c9 Support JetBrains colour scheme (`.icls`) files (#7851)]
by hearsilent https://github.com/hearsilent

References

See also

  • CLOC (Count Lines of Code) on COPTR.