Combine Videos Using FFmpeg

Combine Videos Using FFmpeg

I have found it very useful to concatenate multiple video files together after working on them separately. It turns out, that is rather simple to do with FFmpeg.

How do we do this?

There are three methods I have found thus far:

Using the concat demuxer approach

  • This method is very fast as it avoids transcoding
  • This method only works if the files have the same video and audio encoding, otherwise, artifacts will be introduced

Using file-level concatenation approach

  • There are some encodings that support file-level concatenation, kinda like just using cat on two files in the terminal
  • There are very few encodings that can do this, the only one I've used is MPEG-2 Transport Stream codec (.ts)

Using a complex filter graph with the concat filter

  • This method can concat videos with different encodings
  • This will cause transcoding to occur, so it takes time and may degrade the quality
  • The syntax is hard to understand if you've never written complex filter graphs before for FFmpeg

Let's look at the examples, first the concat demuxer approach:

ffmpeg -f concat -i list.txt -c copy out.mp4

Unlike most FFmpeg commands, this one takes in a text file containing the files we want to concatenate, the text file would look something like this:

file 'video1.mp4'
file 'video2.mp4'

The example for the file level concatenation would look like this:

ffmpeg -i "concat:video1.ts|video2.ts" -c copy out.ts

and the last example would be like so:

ffmpeg -i video1.mp4 -i video2.flv -filter_complex \
"[0:v][0:a][1:v][1:a] concat=n=2:v=1:a=1 [outv] [outa]" \
-map "[outv]" -map "[outa]" out.mp4

This one is probably pretty confusing, so let me explain the complex filter graph syntax:

Unlike using filters normally with FFmpeg using -vf or -af, when using a complex filter graph, we have to tell FFmpeg what streams of data we are operating on per filter.

At the start you see:

[0:v][0:a][1:v][1:a]

This translates in plain English to:

Use the video stream of the first input source, use the audio stream from the first input source, use the video stream from the second input source, and use the audio stream from the second input source.

The square bracket syntax indicates:

[index_of_input:stream_type]

Those of us with experience in programming will understand why the index starts at 0 and not 1

Now after we declared what streams we are using, we have a normal filter syntax:

concat=n=2:v=1:a=1

concat is the name of the filter

n=2 is specifying there are two input sources

v=1 indicates each input source has only one video stream and to write only one video stream out as output

a=1 indicates each input source has only one audio stream and to write only one audio stream out as output

Next, we label the streams of data created by the filter using the bracket syntax:

[outv] [outa]

Here, we are calling the newly created video stream outv and the audio stream outa, we need these later when using the -map flag on the output

Lastly, we need to explicitly tell FFmpeg what streams of data to map to the output being written to the file, using the -map option

-map "[outv]" -map "[outa]"

Do those names look familiar? It's what we labeled the streams created from the concat filter. We are telling FFmpeg:

Don't use the streams directly from the input files, instead use these data streams created by a filtergraph.

And with that, ya let it run and tada, you have concatenated two videos with completely different encodings, hurray!


Did you find this information useful? If so, consider heading over to my donation page and drop me some support.

Want to ask a question or just chat? Contact me here