July 26, 2017

How to Join Two CSV Files on Unix, Linux and BSD Systems

Administration Logo

Suppose you have two CSV files and want to join them to one CSV file. For this you need a tool to extract selected fields and combine them to a new file. The tool should also arrange the fields in a specific order. The command-line tool join can do exactly that. In this tutorial we will show you how to join two CSV files.

Root access is required to edit the following files and to execute commands. Log in as root (su) or simply prepend sudo to all commands that require root privileges.

Sort CSV files

In some cases you need to sort the CSV files by a specific field before you can join them. For this you can use the command-line tool sort.

  • -t , : The field separator is ‘,’
  • -k 2,2 : Character sort on the 2nd field
  • -k 1,1 : Character sort on the 1st field
  • > : Output to file

The following example will sort the first file file1.csv alphabetically on the 2nd field and the second file file2.csv alphabetically on the 1st field. The output is saved to the files sort1.csv and sort2.csv.

sort -t , -k 2,2 file1.csv > sort1.csv
sort -t , -k 1,1 file2.csv > sort2.csv

Join two CSV files

Now, that you have two CSV files with the correct sorting you can start joining them.

  • -t , : The field separator is ‘,’
  • -1 2 : The 2nd field of the first file
  • -2 1 : The 1st field of the second file
  • > : Output to file

In this example the new file is joined by the 2nd field of the first file sort1.csv and the 1st field of the second file sort2.csv. The output is saved to the file sort3.csv.

join -t , -1 2 -2 1 sort1.csv sort2.csv > sort3.csv

The previous command will output all fields. If you want to generate an output of selected fields use the -o parameter instead.

  • -o 1.1 : Output the 1st field of the first file

The following example will join the files like in the previous example and output only the 1st, 2nd, 3rd and 4th field of the first file sort1.csv and the 4th field of the second file sort2.csv. The output is saved to the file sort3.csv.

join -t , -1 2 -2 1 -o 1.1 1.2 1.3 1.4 2.4 sort1.csv sort2.csv > sort3.csv

Speak Your Mind