<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://coptr.digipres.org/index.php?action=history&amp;feed=atom&amp;title=Linguist</id>
	<title>Linguist - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://coptr.digipres.org/index.php?action=history&amp;feed=atom&amp;title=Linguist"/>
	<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Linguist&amp;action=history"/>
	<updated>2026-05-25T03:50:29Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.14</generator>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Linguist&amp;diff=6592&amp;oldid=prev</id>
		<title>Ross-spencer: Initial entry</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Linguist&amp;diff=6592&amp;oldid=prev"/>
		<updated>2026-03-30T12:16:53Z</updated>

		<summary type="html">&lt;p&gt;Initial entry&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Infobox tool&lt;br /&gt;
|purpose=Identify the breakdown of programming languages used in a GitHub repository, or the anticipated language of an individual file&lt;br /&gt;
|homepage=https://github.com/github-linguist/linguist/tree/main&lt;br /&gt;
|tool_status=Maintained&lt;br /&gt;
|sourcecode=https://github.com/github-linguist/linguist/tree/main&lt;br /&gt;
|license=MIT&lt;br /&gt;
|cost=Free and Open Source (FOSS)&lt;br /&gt;
|language=Ruby&lt;br /&gt;
|function=Content Profiling, File Format Identification&lt;br /&gt;
}}&lt;br /&gt;
== Description ==&lt;br /&gt;
&amp;lt;!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Identify the breakdown of programming languages used in a GitHub repository, or the anticipated language of an individual file. &lt;br /&gt;
&lt;br /&gt;
The most well-known output of linguist's process is the language breakdown graph shown on a repository on GitHub documenting the percentage of languages used along with a small chart visualizing that breakdown.&lt;br /&gt;
&lt;br /&gt;
Unlike utilities like DROID, based on PRONOM, which rely on pattern matching to identify, in this case programming language, and thus, file format, GitHub's linguist is deterministic and uses a series of decision making strategies to determine the potential file format.&lt;br /&gt;
&lt;br /&gt;
=== Algorithm === &lt;br /&gt;
&lt;br /&gt;
Linguist uses a number of strategies to reduce the amount of inputs it is dealing with including excluding binary objects and those identified as data before applying a series of identification strategies to try and determine the programming language used across a GitHub repository or an individual file. &lt;br /&gt;
&lt;br /&gt;
Its algorithm is described in more detail [https://github.com/github-linguist/linguist/blob/537297cdae3ab05f8d5dd1c03627a5bd73707b19/docs/how-linguist-works.md on GitHub].&lt;br /&gt;
&lt;br /&gt;
Among its strategies, linguist lists the following checks:&lt;br /&gt;
&lt;br /&gt;
* Vim or Emacs modeline,&lt;br /&gt;
* commonly used filename,&lt;br /&gt;
* shell shebang,&lt;br /&gt;
* file extension,&lt;br /&gt;
* XML header,&lt;br /&gt;
* man page section,&lt;br /&gt;
* heuristics,&lt;br /&gt;
* naïve Bayesian classification&lt;br /&gt;
&lt;br /&gt;
== User Experiences ==&lt;br /&gt;
&amp;lt;!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. Use a bullet list. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* [https://blog.jakelee.co.uk/adding-github-language-with-linguist/ How to add a programming language to GitHub] Jake Lee on Software.&lt;br /&gt;
* [https://github.com/github-linguist/linguist/discussions Linguist Discussions] on GitHub.&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&amp;lt;!-- Provide *evidence* of development activity of the tool. For example, RSS feeds for code issues or commits. --&amp;gt;&lt;br /&gt;
All development activity is visible on GitHub: https://github.com/github-linguist/linguist/commits&lt;br /&gt;
 &lt;br /&gt;
=== Release Feed ===&lt;br /&gt;
Below the last 3 release feeds:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://github.com/github-linguist/linguist/releases.atom&amp;lt;/rss&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
=== Activity Feed ===&lt;br /&gt;
Below the last 5 commits:&lt;br /&gt;
&amp;lt;rss max=5&amp;gt;https://github.com/github-linguist/linguist/commits/main.atom&amp;lt;/rss&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/github-linguist/linguist/blob/537297cdae3ab05f8d5dd1c03627a5bd73707b19/docs/how-linguist-works.md How Linguist Works] on GitHub.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [https://coptr.digipres.org/Cloc CLOC] (Count Lines of Code) on COPTR.&lt;br /&gt;
&lt;br /&gt;
{{Infobox tool details}}&lt;/div&gt;</summary>
		<author><name>Ross-spencer</name></author>
	</entry>
</feed>