silikonnine.blogg.se - Aws postgresql load data from s3

The AWS Redshift advisor prompts us to split the CSV file into multiple (equal size) files so that the COPY command can load the data in parallel into the corresponding table. Our export script, with compression, will look like this: In order to compress the CSV file, we can take advantage of the TO PROGRAM option of the \COPY command and execute the compression on the DB server (check the docs for more information). We will go with the first approach in order to avoid overloading the DB server. Note that the only difference between these approaches is the /COPY vs COPY command. Also, the second approach requires the DB user elevated privileges as we discussed in the first section. In the second approach, the compression is performed on the DB server side, which means that we will transfer compressed data between the DB and the sync service, but since compression is a CPU intensive task in general, it will put some extra load on the DB server. For large amounts of data, this could slow down the export process. In the first approach, the compression is performed on the client-side, which means that the uncompressed data is transferred from the DB server to the sync service. There are pros and cons for each approach. We can either export the CSV file, store it locally on the sync service and then compress it or we can perform the compression during the export on the DB server. Compression will speed up the process since the amount of data uploaded to the S3 bucket will be reduced. Step 1: Compress the CSV fileĪs a first improvement, we can try to compress the exported CSV file using gzip. Although this approach works for relatively small tables, the performance degrades as the data grows.