Loading large flat files into MySQL with LOAD DATA INFILE and pt-fifo-split

It’s easy to load multiple rows of data from a file into a MySQL table with a single LOAD DATA INFILE command, but if the file is over a certain size you may want to load it in multiple chunks. When I have a file with millions of lines I typically use pt-fifo-split to separate the file into multiple chunks in order to limit the size of each transaction. Unlike most tools in Percona Toolkit, pt-fifo-split doesn’t actually interact with your database, but this is one MySQL use case where it’s very helpful.
Here’s a quick example in bash that illustrates how I can take a large tab-delimited file, break it into multiple 100,000 line chunks, and load the chunks into a table:
“`
FLAT_FILE="/tmp/big_file.txt"
FIFO_PATH="${FLAT_FILE}.fifo"
LOAD_FILE="${FLAT_FILE}.load"
CHUNK_SIZE=100000
Split the file
pt-fifo-split –force –lines ${CHUNK_SIZE} ${FLAT_FILE} –fifo ${FIFO_PATH} &
Sleep 10 seconds to assure ${FIFO_PATH} exists before entering loop
sleep 10
while [ -e ${FIFO_PATH} ]
do
# Write chunk to disk
cat ${FIFO_PATH} > ${LOAD_FILE}
# Load chunk into table
mysql –database=test \
–show-warnings \
-vve "load data infile ‘${LOAD_FILE}’ into table my_table;"
done
“`
YMMV, so you should do some testing to determine the optimal chunk size based on your schema, storage engine, replication setup, complexity of LOAD DATA INFILE statement, etc.
via Planet MySQL
Loading large flat files into MySQL with LOAD DATA INFILE and pt-fifo-split