Project Code Statistics

· 2 min read · 355 Words · -Views -Comments

I was asked to measure the total code size across all project repos. This write-up documents the process I automated.

Tasking

To collect code statistics:

  1. Batch-clone every repo in the group using ghorg.
  2. Run cloc for each repo.
  3. Merge the CSV outputs.

Code

Steps:

  1. Clone repositories

    ghorg clone groupname --concurrency=50 --protocol=ssh --scm=gitlab --base-url=https://gitlab.com --token=CP-SevCew54pjq_JnuDb
    
    • Install ghorg per the docs.
    • The token is your GitLab access token.
  2. Generate reports

    • Install cloc: npm i cloc
    • Script below iterates over repos, runs cloc, and merges reports. Configure the four path variables before running.
    const fs = require('fs');
     const { execSync } = require('child_process');
    
     // Path to the cloned repositories
     const rootDir = '';
    
     // cloc executable
     const clocCommand = '/Users/qhe/.nvm/versions/node/v10.16.0/bin/cloc';
    
     // Temporary directory for individual reports
     const reportFiles = '';
    
     // Final merged report path
     const reportFile = '';
    
     const deleteFolderRecursive = function (path) {
     if (fs.existsSync(path)) {
     fs.readdirSync(path).forEach(function (file, index) {
       var curPath = path + '/' + file;
       if (fs.lstatSync(curPath).isDirectory()) {
         deleteFolderRecursive(curPath);
       } else {
         fs.unlinkSync(curPath);
       }
     });
     fs.rmdirSync(path);
     }
     };
    
     /**
    
     * @description
     * Generate a report for each repository
     */
     function createReport() {
       return new Promise((resolve) => {
         fs.readdir(rootDir, (_0, files) => {
           files.forEach((file) => {
             if (fs.statSync(`${rootDir}/${file}`).isFile()) {
               return;
             }
             execSync(
               `${clocCommand} ${rootDir}/${file} --csv --out=out/${file}.csv`,
               (error, stdout, stderr) => {
                 if (error) {
                   console.log(`error: ${error.message}`);
                   return;
                 }
                 if (stderr) {
                   console.log(`stderr: ${stderr}`);
                   return;
                 }
                 console.log(`stdout: ${stdout}`);
               }
             );
           });
           resolve();
         });
       });
     }
    
    
     /**
     * @description
     * Merge repository reports into a single CSV file
     */
     function mergeReport() {
       let mergedReport = '';
       fs.readdir(reportFiles, (_0, files) => {
         files.forEach((file) => {
           if (file === '.DS_Store') {
             return;
           }
           const contents = fs.readFileSync(`${reportFiles}/${file}`, {
             encoding: 'utf8',
           });
           mergedReport += `${file.replace(/\.csv/, '')},,,,,\n` + contents;
         });
         deleteFolderRecursive(reportFiles);
         fs.writeFile(reportFile, mergedReport, function (err) {
           if (err) {
             console.log(err);
           }
         });
       });
     }
    
     createReport().then(() => mergeReport());
    

Result

  • With filters in Excel I can quickly explore the data: 34 repos totaling 1,340,244 lines (including vendor code).
  • Note: running cloc on the parent directory vs. summing repo-level runs can differ. I trust the per-repo numbers.

Closing Thoughts

With the script in place, future runs take only two commands.

References

Authors
Developer, digital product enthusiast, tinkerer, sharer, open source lover