30

I have a repo with Ruby and PHP code in it.

Github says my repo is 74.8% PHP and 25.2% Ruby

I do not understand how this can be. When I compare the 2 languages in my project:

# Count how many files:

# Ruby
ls | grep ".*\.rb" | wc -l
# returns 10

#PHP
ls | grep ".*\.php" | wc -l
# returns 1


# Count how many lines, words, chars:

# Ruby
cat *.rb | wc
# returns 229, 812, 5303

# PHP
cat *.php | wc
# returns 102, 473, 2760

Ruby always seems to have more.

Am I missing something?

JD Isaacks
  • 8,936

1 Answers1

27

github uses Linguist to detect languages in a project.

Linguist is open source. look into the source files and you'll find:

in /bin/linguist

repo.languages.sort_by { |_, size| size }.reverse.each do |language, size|
  percentage = ((size / repo.size.to_f) * 100).round
  puts "%-4s %s" % ["#{percentage}%", language]
end

in /lib/linguist/file_blob.rb

 # Public: Get byte size
 #
 # Returns an Integer.
 def size
   File.size(@path)
 end

so it actually use file sizes to determine the language percentage.

also keep in mind that binary data, vendored files, generated files, and non-program files are excluded.

Huang Tao
  • 581