Automatic Generation of Density of States Plot and Bandgap

Updated version can be found here: https://blog.levilentz.com/?p=274

If you would just like to download this code to calculate the bandgap from the DOS, simply download these two files: 1 2 and run the python script in your quantum espresso directory. Continue reading for a description of how it works.

Hello All!

The past two weeks I was at the IPAM workshop on Density Functional Theory (more on that to come soon), and I finally had a chance to make a script that I have been thinking about: something that can calculate the bandgap of a system from the projected density of state plots that are generated by Quantum Espresso.

The main reason this is necessary is that the bandgap of the system is generally dependent on the location in k-space, and hence to find the global max/min (for valence/conduction bands) you either need an extremely dense k-point mesh or a k-point path that hits the conduction band minimum (CBM) or valence band maximum (VBM). To this end, using a slight Gaussian smearing within the SCF calculation, we hope to get the CBM and VBM from the DOS at a lower computational k-point need. At least that is the theory that I am testing for other research aims.

This also helps out by generating the DOS plots that I need for all the publications that my advisor has been expecting me to write.

As a recap, when you run a dos and pdos calculation within Quantum Espresso, you generate a file for each orbital in your pseudopotential. As an example, here is my file list for a simple calculation that I was working on:


Ti.pdos_atm#1(Ti)_wfc#1(s)
Ti.pdos_atm#1(Ti)_wfc#2(p)
Ti.pdos_atm#1(Ti)_wfc#3(s)
Ti.pdos_atm#1(Ti)_wfc#4(d)
Ti.pdos_atm#10(O)_wfc#1(s)
Ti.pdos_atm#10(O)_wfc#2(p)
Ti.pdos_atm#11(O)_wfc#1(s)
.........................

And so on for every atom in the system. Taking a peek inside of one of these files, you will see the following data-structure:

# E (eV)   ldos(E)   pdos(E)    pdos(E)    pdos(E)
-55.480 -0.211E-09 -0.224E-10 -0.749E-10 -0.114E-09
-55.380 -0.121E-08 -0.129E-09 -0.430E-09 -0.655E-09
-55.280 -0.491E-08 -0.520E-09 -0.174E-08 -0.265E-08
-55.180 -0.131E-07 -0.139E-08 -0.465E-08 -0.708E-08
-55.080 -0.185E-07 -0.196E-08 -0.658E-08 -0.100E-07
-54.980  0.874E-08  0.930E-09  0.308E-08  0.473E-08
-54.880  0.105E-06  0.112E-07  0.374E-07  0.569E-07
-54.780  0.251E-06  0.266E-07  0.890E-07  0.135E-06
-54.680  0.341E-06  0.361E-07  0.121E-06  0.184E-06
...

Where the first column is the energy, the second is the local density of states, and the next three is the x,y,x values of the DOS. We are mainly
interested in the first two columns. Essentially the code needs to do the following:

1. Open each wfc file
2. Shift the energy by the fermi energy (to make 0 the fermi level)
3. Find the first energy above and below 0 to have an occupation greater than 0. In other words, we need the ldos column to be greater than 0 and we need to store this energetic value.
4. Finally, find the highest valence band energy and lowest conduction band energy found in 3. This will be the overall band energies and bandgap of the bulk system.

I wrote this script in Perl originally, but found that Python, combined with matplotlib could do this much more succinctly.

Here is the whole code. You can download it here. You can also download the color.key file here; this uses the same color scheme that jmol uses.

#! /usr/bin/env python

import sys
import os
import re
import glob
import numpy
from matplotlib.pyplot import *

def main():
    filelist = glob.glob("*wfc*")
    Compiled = []
    VBM = -1000
    CBM = 1000
    nscfout = open('nscf.out','r')
    for i in nscfout:
        if 'Fermi' in i:
            fermi = float(i.split()[4])
    nscfout.close()
    filekey = open('color.key','r')
    colorkey = []
    colorkey = {'null' : 12345}
    for i in filekey:
        colorkey[i.split()[0]] = i.split()[1]
    filekey.close()
    for filename in filelist:
        #print filename
        contents = open(filename,"r")
        dummyvariable = contents.readline()
        DOS = []
        Energy = []
        LDOS = []
        for line in contents:
            DOStemp = line.split()
            DOStemp = [float(x) for x in DOStemp]
            DOStemp[0] -= fermi
            DOS.append(DOStemp)
            Energy.append(DOStemp[0])
            LDOS.append(DOStemp[1])
        centerpoint = Energy.index(min(x for x in Energy if x > 0))
        for i in range(centerpoint, len(Energy)):
            if LDOS[i] > 0 and LDOS[i+1]>0 and LDOS[i+2]>0:
                ConductionBandPoint = i
                break
        for i in range(centerpoint,0,-1):
            if LDOS[i] >0 and LDOS[i-1] >0 and LDOS[i-2] > 0:
                ValenceBandPoint = i
                break
        if Energy[ValenceBandPoint] > VBM:
            VBM = Energy[ValenceBandPoint]
        if Energy[ConductionBandPoint] < CBM:
            CBM = Energy[ConductionBandPoint]
        Bandgap = abs(Energy[ValenceBandPoint]-Energy[ConductionBandPoint])    
        Compiled.append([filename,Energy[ValenceBandPoint],Energy[ConductionBandPoint],Bandgap])    
        specie = filename[filename.find("(")+1:filename.find(")")]
        plot(Energy, LDOS, colorkey[specie])
        contents.close()
    print "Name\tVBM\tCBM\tBandGap"
    for i in Compiled:
        print i[0],"\t",i[1],"\t",i[2],"\t",i[3]
    Bandgap = abs(VBM-CBM)
    print "Totals:\tVBM\tCBM\tBandgap"
    print "\t",VBM,"\t",CBM,"\t",Bandgap
    xlim(-10,10)
    ylim(0,5)
    xlabel('Energy (E-$\mathrm{E_f}$)', size='large')
    ylabel('DOS (arb. units)', size='large')
    show()
    savefig('DOS.png')
    savefig('DOS.svg')
if __name__ == '__main__':
    main()

Now just full disclosure: this is my first time using Python and I mainly wrote this as a an introductory tutorial for myself into python. I am just going to describe the main function as rehashing exactly what python does would not be beneficial considering my overall naivety when it comes to this language.

    filelist = glob.glob("*wfc*")
    Compiled = []
    VBM = -1000
    CBM = 1000
    nscfout = open('nscf.out','r')
    for i in nscfout:
        if 'Fermi' in i:
            fermi = float(i.split()[4])
    nscfout.close()

In this section, it essentially declares some lists, gets the filenames in the directory, and greps the fermi energy out of the file ‘nscf.out’. The variable filelist is simply a list of the file names containing the DOS information. The VBM and CBM are very negative and positive values, respectively, to get the overall conduction and valence band values (iteratively). While the loop loops through the nscf file to get the Fermi energy (note: this would have to be reworked to be used on relax calculation).

    filekey = open('color.key','r')
    colorkey = []
    colorkey = {'null' : 12345}
    for i in filekey:
        colorkey[i.split()[0]] = i.split()[1]
    filekey.close()

In this section, I have created a file called ‘color.key.’ This is a tab separated file containing two columns the species name and a color (in this case HTML hex colors) that will be used to make out DOS plot pretty by default. It then reads these values into a dictionary list (more info here ) so that we can easily reference this data structure latter using our species names.

        contents = open(filename,"r")
        dummyvariable = contents.readline()
        DOS = []
        Energy = []
        LDOS = []
        for line in contents:
            DOStemp = line.split()
            DOStemp = [float(x) for x in DOStemp]
            DOStemp[0] -= fermi
            DOS.append(DOStemp)
            Energy.append(DOStemp[0])
            LDOS.append(DOStemp[1])

In this portion of the code, it creates some more lists then loops over the content of the file. It skips the header by creating that dummyvariable (pardon the poor programming on this little trick). It then splits the file on the tabs into a temporary array, then converts each variable into a flow, subtracts off the fermi energy, then appends three lists: DOS, Energy, and LDOS. I have created the DOS list to contain all the values of the files, while extracting the energy and the LDOS into specific one dimensional lists to make the parsing easier.

        centerpoint = Energy.index(min(x for x in Energy if x > 0))
        for i in range(centerpoint, len(Energy)):
            if LDOS[i] > 0 and LDOS[i+1]>0 and LDOS[i+2]>0:
                ConductionBandPoint = i
                break
        for i in range(centerpoint,0,-1):
            if LDOS[i] >0 and LDOS[i-1] >0 and LDOS[i-2] > 0:
                ValenceBandPoint = i
                break
        if Energy[ValenceBandPoint] > VBM:
            VBM = Energy[ValenceBandPoint]
        if Energy[ConductionBandPoint] < CBM:
            CBM = Energy[ConductionBandPoint]

In this section of code, several things happen

1. We find the index of the ‘centerpoint.’ This is simply the point where the sign of the energy switches from negative to positive. This is the closest index to the fermi energy and is necessary to know to keep track of where the true VBM and CBM are

2. We then loop from the centerpoint to the first value where the LDOS is greater than 0 ( you can change this to have a filtering effect where as opposed to 0, you can have some small value greater than 0). This is the index of the conduction band. Going from the center point backwards finds the valence band index.

3. We then check to see if the VBM/CBM is greater/less than the previous VBM/CBM. Reassignment of this variable allows us to keep track of the global VBM/CBM in one loop to save a minute amount of computational time.

        Bandgap = abs(Energy[ValenceBandPoint]-Energy[ConductionBandPoint])    
        Compiled.append([filename,Energy[ValenceBandPoint],Energy[ConductionBandPoint],Bandgap])    
        specie = filename[filename.find("(")+1:filename.find(")")]
        plot(Energy, LDOS, colorkey[specie])
        contents.close()

Here we simply are computing the bandgap then storing the values inside of the array Compiled. The rest of this is simply for the plotting of the DOS inside of matplotlib.

The remainder of the code, shown below simply prints the found data giving you the total bandgap of your system from the projected density of states files.

<pre>    print "Name\tVBM\tCBM\tBandGap"
    for i in Compiled:
        print i[0],"\t",i[1],"\t",i[2],"\t",i[3]
    Bandgap = abs(VBM-CBM)
    print "Totals:\tVBM\tCBM\tBandgap"
    print "\t",VBM,"\t",CBM,"\t",Bandgap
    xlim(-10,10)
    ylim(0,5)
    xlabel('Energy (E-$\mathrm{E_f}$)', size='large')
    ylabel('DOS (arb. units)', size='large')
    show()
    savefig('DOS.png')
    savefig('DOS.svg')</pre>

The code presented above will calculate the bandgap from the density of states generated from Quantum Espresso. Using this method, it is quickly possible to extract this value in a systematic way rather than based on visual analysis of the DOS graph. Below is the image that is generated by the code, all pretty and ready to be published:

DOS

5 comments Add yours
  1. Dear Levi,
    I tried to use the python (although I am not familiar with python) and firstly, I installed matplotlib and corrected the name of my nscf.out but now I have an error about color.key file. I made an empty one file so the error is solved but ofcourse nothing is shown in my plot. I don’t know what should be written in this file and why we need this file? Would you please help me?
    Thanks in advance. Sincerely,
    Samira

  2. Levi, I have tried to use your script but I get a “ValueError: min() arg is an empty sequence” when it reaches the “centerpoint = Energy.index(min(x for x in Energy if x > 0))” line of the script. Any ideas why that is?

    Thanks,
    Robert

    1. Hello Robert,

      I actually have not seen that error before, but it seems to me that the Energy array is somehow undefined. Could you verify that it is not empty? Feel free to email me directly at llentz [at] mit.edu if you would like more help.

      Cheers,

      Levi

  3. Levi your code is amazing. I wanted to know if its possible to output the inverse participation ratio in espresso. or charge density vs energy data.
    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *