Digitization of data is a broad process of translating any size of data into a digital format. In simpler words, this is the process of converting scanned copies or hardcopies into digital version so that the content can become easy to find, edit or tag. There are many BPM companies and follow the steps involved in this process, including web data extraction through OCR software. This technology ensures recognising content from any PDFs or image files and then, scripting in Python, R or any other programming language helps in capturing and saving in a particular location. This is where the file, size, format and content are refined through cleansing, standardisation and optimisation.