MySQL: Import CSV, not using LOAD DATA
https://ift.tt/339yFDb
All over the Internet people are having trouble getting LOAD DATA
and LOAD DATA LOCAL
to work. Frankly, do not use them, and especially not the LOCAL
variant. They are insecure, and even if you get them to work, they are limited and unlikely to do what you want. Write a small data load program as shown below.
Not using LOAD DATA LOCAL
The LOCAL version of LOAD DATA has two potential security issues:
Because LOAD DATA LOCAL is an SQL statement, parsing occurs on the server side, and transfer of the file from the client host to the server host is initiated by the MySQL server, which tells the client the file named in the statement. In theory, a patched server could tell the client program to transfer a file of the server’s choosing rather than the file named in the statement. Such a server could access any file on the client host to which the client user has read access. (A patched server could in fact reply with a file-transfer request to any statement, not just LOAD DATA LOCAL, so a more fundamental issue is that clients should not connect to untrusted servers.)
In a Web environment where the clients are connecting from a Web server, a user could use LOAD DATA LOCAL to read any files that the Web server process has read access to (assuming that a user could run any statement against the SQL server). In this environment, the client with respect to the MySQL server actually is the Web server, not a remote program being run by users who connect to the Web server.
The second issue in reality means that if the web server has a suitable SQL injection vulnerability, the attacker may use that to read any file the web server has access to, bouncing this through the database server.
In short, never use (or even enable) LOAD DATA LOCAL
.
-
local_infile
is disabled in the server config, and you should keep it that way. - client libraries are by default compiled with
ENABLED_LOCAL_INFILE
set to off. It can still be enabled using a call to themysql_options()
C-API, but never do that. - 8.0.21+ places additional restrictions on this, to prevent you from being stupid (that is, actually enabling this anywhere).
Not using LOAD DATA
The LOAD DATA
variant of the command assumes that you place a file on the database server, into a directory in the file system of the server, and load it from there. In the age of “MySQL as a service” this is inconvenient to impossible, so forget about this option, too.
If you were able to do place files onto the system where your mysqld lives,
- your user needs to have
FILE
as a privilege, a global privilege (GRANT FILE TO ... ON *.*
) - the server variable
secure_file_priv
needs to be set to a directory name, and that directory needs to be world-readable.LOAD DATA
andSELECT INTO OUTFILE
work only with filenames below this directory. Setting this variable requires a server restart, this is not a dynamic variable (on purpose).
Note that the variable can be NULL
(this is secure in the sense that LOAD DATA
is disabled) or empty (this is insecure in that there are no restrictions).
There is nothing preventing you from setting the variable to /var/lib/mysql
or other dumb locations which would expose vital system files to load and save operations. Do not do this.
Also, a location such as /tmp
or any other world-writeable directory would be dumb: Use a dedicated directory that is writeable by the import user only, and make sure that it is world-readable in order to make the command work.
Better: Do not use this command at all (and set secure_file_priv
to NULL).
Using data dump and load programs instead
We spoke about dumping a schema into CSV files in Export the entire database to CSV already.
To complete the discussion we need to provide a way to do the inverse and load data from a CSV file into a table.
The full code is in load.py.
The main idea is to open a .csv
file with csv.reader
, and then iterate over the rows. For each row, we execute an INSERT
statement, and every few rows we also COMMIT
.
In terms of dependencies, we rely on MySQLdb
and csv
:
We need to know the name of a table, and the column names of that table (in the order in which they appear).
We should also make sure we can change the delimiter and quoting character used by the CSV, and make the commit interval variable.
Finally, we need to be able to connect to the database.
From this, we can build a database connection and an INSERT
statement, using the table name and column names:
The actual code is then rather simple: Open the CSV file, named after the table, and create a csv.reader()
. Using this, we iterate over the rows.
For each row, we execute the insert statement.
Every commit_interval
rows we commit, and for good measure we also commit after finishing, to make sure any remaining rows also get written out.
And that it. That’s all the code.
- No
FILE
privilege, - No special permissions besides
insert_priv
into the target table. - No config in the database.
- No server restart to set up the permissions.
And using Python’s multiprocessing, you could make it load multiple tables in parallel or chunk a very large table and load that in parallel – assuming you have database hardware that could profit from any of this.
In any case – this is simpler, more secure and less privileged than any of the broken LOAD DATA
variants.
Don’t use them, write a loader program.
Let’s run it. First we generate some data, using the previous example from the partitions tutorial:
We then dump the data, truncate the table, and reload the data. We count the rows to be sure we get all of them back.
technology
via Planet MySQL https://ift.tt/2iO8Ob8
September 28, 2020 at 02:09PM