Hello All,
Sorry this one is a bit late, classes are really picking up.
As part of my roles in the Kolpak Group, I am responsible for administration of all of our computational resources at MIT. We have one large 32 node cluster, with 16 processors/node as well as several other smaller clusters. We have ~12 members in the group, so these resources quickly get exhausted. One of our members really wanted a way to tell how many nodes are available, who is using what, etc. There is a great script that parses the output of pbsnodes that you can get here: pestat. This gives a lot of data about the various nodes; we wanted something that would give a nice little output showing who is using what and what is available.
#!/usr/bin/python import sys import commands from subprocess import call numnodes = 32 #number of nodes ppn = 16 #processors/nodes print OKGREEN = "\033[92m" FAIL = "\033[94m" ENDC = "\033[0m" RED = "\033[31m" jobsstring = commands.getstatusoutput('qstat -a | sed 1,5d') users = {'USER1':[0,0],'USER2':[0,0],...} #Populate this with all the users in the cluster total = [0,0] if jobsstring[1] == '': print RED + "We are currently using 0% of our cluster" + ENDC sys.exit() for i in jobsstring[1].split('\n'): j = i.split() if "R" in j[9]: users[j[1]][0] += int(j[5]) users[j[1]][1] += int(j[6]) total[0] += int(j[5]) total[1] += int(j[6]) for i in users: j = users[i] percentages = [" (" + str(round(100*float(j[0])/numnodes,2)) + ")"," (" + str(round(100*float(j[1])/(numnodes*ppn),2)) + ")"] users[i] = [str(j[0]) + percentages[0],str(j[1]) + percentages[1]] header = ['Users','Num Nodes (%)','Num Processors (%)'] width = max(max(len(i) for i in users),max(len(i) for i in header)) print "".join(word.ljust(width) for word in header) for i in users: if users[i][0] == '0 (0.0)': print OKGREEN + i.ljust(width) + ENDC,users[i][0].ljust(width),users[i][1].ljust(width) else: print FAIL + i.ljust(width),users[i][0].ljust(width),users[i][1].ljust(width) + ENDC print "We are using " + str(round(100*float(total[0])/numnodes,2)) + "% of the cluster\n" print OKGREEN + str(numnodes - total[0]) + " nodes are available" + ENDC sys.exit()
Essentially, this stores all the users in a dictionary that stores an array of [0,0] representing [processors,nodes], parses the output of qstat, and then calculates the percentage of use per person. The output is nice and colorfied due to the ENDC, OKGREEN, etc. I really think colors should be a requirement of terminal outputs (sorry, I dont know how to make the colors show up in wordpress).
Users Num Nodes (%) Num Processors (%) USER1 0 (0.0) 0 (0.0) USER2 0 (0.0) 0 (0.0) USER3 0 (0.0) 0 (0.0) USER4 0 (0.0) 0 (0.0) USER5 0 (0.0) 0 (0.0) USER6 0 (0.0) 0 (0.0) USER7 0 (0.0) 0 (0.0) USER8 2 (6.25) 32 (6.25) USER9 4 (12.5) 64 (12.5) USER10 0 (0.0) 0 (0.0) USER11 0 (0.0) 0 (0.0) USER12 0 (0.0) 0 (0.0) USER13 0 (0.0) 0 (0.0) USER14 2 (6.25) 32 (6.25) USER15 9 (28.13) 144 (28.13) USER16 0 (0.0) 0 (0.0) We are using 53.13% of the cluster 15 nodes are available
Hopefully you find this useful.
Happy computing,
Levi