Monitor Available Nodes

Hello All,

Sorry this one is a bit late, classes are really picking up.

As part of my roles in the Kolpak Group, I am responsible for administration of all of our computational resources at MIT. We have one large 32 node cluster, with 16 processors/node as well as several other smaller clusters. We have ~12 members in the group, so these resources quickly get exhausted. One of our members really wanted a way to tell how many nodes are available, who is using what, etc. There is a great script that parses the output of pbsnodes that you can get here: pestat. This gives a lot of data about the various nodes; we wanted something that would give a nice little output showing who is using what and what is available.

#!/usr/bin/python

import sys
import commands
from subprocess import call

numnodes = 32 #number of nodes
ppn = 16 #processors/nodes

print
OKGREEN = "\033[92m"
FAIL = "\033[94m"
ENDC = "\033[0m"
RED = "\033[31m"

jobsstring = commands.getstatusoutput('qstat -a | sed 1,5d')

users = {'USER1':[0,0],'USER2':[0,0],...} #Populate this with all the users in the cluster 
total = [0,0]

if jobsstring[1] == '':
  print RED + "We are currently using 0% of our cluster" + ENDC
  sys.exit()

for i in jobsstring[1].split('\n'):
  j = i.split()
  if "R" in j[9]:
    users[j[1]][0] += int(j[5])
    users[j[1]][1] += int(j[6])
    total[0] += int(j[5])
    total[1] += int(j[6])
for i in users:
  j = users[i]
  percentages = [" (" + str(round(100*float(j[0])/numnodes,2)) + ")"," (" + str(round(100*float(j[1])/(numnodes*ppn),2)) + ")"]
  users[i] = [str(j[0]) + percentages[0],str(j[1]) + percentages[1]] 
header = ['Users','Num Nodes (%)','Num Processors (%)']
width = max(max(len(i) for i in users),max(len(i) for i in header))
print "".join(word.ljust(width) for word in header)
for i in users:
  if users[i][0] == '0 (0.0)':
    print OKGREEN + i.ljust(width) + ENDC,users[i][0].ljust(width),users[i][1].ljust(width)
  else:
    print FAIL + i.ljust(width),users[i][0].ljust(width),users[i][1].ljust(width) + ENDC
  
print "We are using " + str(round(100*float(total[0])/numnodes,2)) + "% of the cluster\n"
print OKGREEN + str(numnodes - total[0]) + " nodes are available" + ENDC
sys.exit()

Essentially, this stores all the users in a dictionary that stores an array of [0,0] representing [processors,nodes], parses the output of qstat, and then calculates the percentage of use per person. The output is nice and colorfied due to the ENDC, OKGREEN, etc. I really think colors should be a requirement of terminal outputs (sorry, I dont know how to make the colors show up in wordpress).

Users             Num Nodes (%)     Num Processors (%)
USER1             0 (0.0)            0 (0.0)           
USER2             0 (0.0)            0 (0.0)           
USER3             0 (0.0)            0 (0.0)           
USER4             0 (0.0)            0 (0.0)           
USER5             0 (0.0)            0 (0.0)           
USER6             0 (0.0)            0 (0.0)           
USER7             0 (0.0)            0 (0.0)           
USER8             2 (6.25)           32 (6.25)         
USER9             4 (12.5)           64 (12.5)         
USER10            0 (0.0)            0 (0.0)           
USER11            0 (0.0)            0 (0.0)           
USER12            0 (0.0)            0 (0.0)           
USER13            0 (0.0)            0 (0.0)           
USER14            2 (6.25)           32 (6.25)         
USER15            9 (28.13)          144 (28.13)       
USER16            0 (0.0)            0 (0.0)           
We are using 53.13% of the cluster

15 nodes are available

Hopefully you find this useful.

Happy computing,

Levi

Leave a Reply

Your email address will not be published. Required fields are marked *