Use Google Cloud Speech API to Transcribe Large Video Files

Large video files can now be transcribed in English, Russian, Polish, and other languages by artificial intelligence with astounding levels of accuracy. It would be interesting to compare its accuracy to that of a professional transcriptionist. Looking forward, the next thing to work on with artificial intelligence is the ability to add punctuation and capitalize as appropriate.

If you want to use the Google Speech API for over 60 seconds, follow the linux procedure I developed below.

  1. First, we need to convert your video file into the right format. It needs to be in mono and not stereo, and ideally in Free Lossless Audio Codec (#FLAC) format.
    • Download and install FFmpeg.
    • Kill two birds with one stone by converting to a mono FLAC by navigating to the directory where our video file is and typing the following into the terminal:
      ffmpeg -i [YOUR_VIDEO_FILENAME].[EXTENSION] -ac 1 mono.flac
  2. Your output file will probably be huge. If your internet connection isn't the fastest, you're going to want to break it up into bite-sized chunks. The following code will break it up into 5 minute (300 second) pieces in a separate directory:
    mkdir chunks
    ffmpeg -i mono.flac -f segment -segment_time 300 -c copy chunks/out%03d.flac 
  3. Now, you'll need to set up a Google Cloud Platform (#GCP) account. You'll probably come out ahead by doing so, as new accounts usually get $300 worth of free credit. Go sign up!
  4. Create or select a Google Cloud Platform project to use the Speech API with. Follow the instructions here.
  5. Go ahead and install the Google Cloud Software Development Kit (SDK)
  6. Initialize the software development kit by typing:
    gcloud init
    Follow the prompts and select your project.
  7. If you haven't downloaded the JavaScript Object Notation (JSON) key already, go back to step 2 and make sure to do so. Now we're going to active it. Type the path to the key in place of [PATH] below.
    gcloud auth activate-service-account --key-file=[PATH]
  8. Remember all those chunks of your file we made? Time to upload them to the Google Cloud. Go to the Buckets Browser and create a bucket for our project. Upload the "chunks" folder from your local directory.
    • Copy and paste the following code into a text or code editor.
      #!/usr/bin/env python
      # Copyright 2017 Google Inc. All Rights Reserved.
      # Licensed under the Apache License, Version 2.0 (the "License");
      # you may not use this file except in compliance with the License.
      # You may obtain a copy of the License at
      # Unless required by applicable law or agreed to in writing, software
      # distributed under the License is distributed on an "AS IS" BASIS,
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      # See the License for the specific language governing permissions and
      # limitations under the License.
      """Google Cloud Speech API sample application using the REST API for batch
      Example usage:
          python resources/audio.raw
          python gs://cloud-samples-tests/speech/brooklyn.flac
      # [START import_libraries]
      import argparse
      import io
      # [END import_libraries]
      def transcribe_gcs(gcs_uri):
          """Asynchronously transcribes the audio file specified by the gcs_uri."""
          from import speech
          from import enums
          from import types
          client = speech.SpeechClient()
          audio = types.RecognitionAudio(uri=gcs_uri)
          config = types.RecognitionConfig(
      ###Replace en-US with the appropriate language code!
          operation = client.long_running_recognize(config, audio)
          print('Waiting for operation to complete...')
          response = operation.result(timeout=90)
          # Each result is for a consecutive portion of the audio. Iterate through
          # them to get the transcripts for the entire audio file.
          for result in response.results:
              # The first alternative is the most likely one for this portion.
              print('Transcript: {}'.format(result.alternatives[0].transcript))
              print('Confidence: {}'.format(result.alternatives[0].confidence))
      ### Following added March 2018 to save output to transcript.txt
              with open("transcript.txt", "a") as f:
      #Following added March 2018
      #Please replace "name" and edit file paths appropriate.
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 5 minutes \n\n\n")
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 10 minutes \n\n\n")
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 15 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 20 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 25 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 30 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 35 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 40 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 45 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 50 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 55 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 60 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 65 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 70 minutes \n\n\n")  
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 75 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 80 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 85 minutes \n\n\n")    
      with open("transcript.txt", "a") as f:
          f.write("\n\n\n 90 minutes \n\n\n")    
      #end of modifications 
      if __name__ == '__main__':
          parser = argparse.ArgumentParser(
              'path', help='File or GCS path for audio file to be recognized')
          args = parser.parse_args()
          if args.path.startswith('gs://'):
    • Be sure to replace "name" in every "gs://name/chunks/" with the name you picked. Check Buckets Browser if you forgot.
    • Replace "en-US" with the appropriate language code. For example, "uk-UA" for Ukrainian, "ru-RU" for Russian, and "pl-PL" for Polish. For more language codes, see the official list.
    • Save the modified code as
  9. Ready? Set? Run the script:
  10. Let it run. The script has been modified above in kind of a ghetto way, but anything fancier will get you an "over the quota" error message. Take a deep breath, wait a while, and once it's ready, open up transcript.txt and get to work proofreading your transcription!