http://wiki.docking.org/api.php?action=feedcontributions&user=Jizhou&feedformat=atomDISI - User contributions [en]2024-03-28T17:18:34ZUser contributionsMediaWiki 1.39.1http://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10923How to do indexing, partition, and migration in Postgres 102018-08-03T22:16:11Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Tranche_name<br />
! Query<br />
! Running time<br />
|-<br />
| ADBA<br />
| Cc1cc(C(=O)O)cc(C)c1[N+](=O)[O-]<br />
| 1549.504 ms<br />
|-<br />
| BDCD<br />
| CC(C)OCC(=O)N1CCc2cccc(O)c2C1<br />
| 1049.261 ms<br />
|-<br />
| CAGE<br />
| NNC(=O)c1nnn(Cc2ccccc2)c1C(=O)NN<br />
| 971.508 ms<br />
|-<br />
| DCEB<br />
| O=C1c2nc[nH]c2C(=O)C(SCCO)=C1SCCO<br />
| 882.626 ms<br />
|-<br />
| ECAD<br />
| C[C@@H](CCNC(=O)C(C)(C)F)NC(=O)[C@H]1CCc2nncn2CC1<br />
| 1001.437 ms<br />
|-<br />
| KDED<br />
| COc1ccc(C=c2sc(=C(C#N)C(=O)N3CCCC3)n(CCN3CCOCC3)c2=O)cc1OCc1ccccc1<br />
| 1043.976 ms<br />
|}</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10922How to do indexing, partition, and migration in Postgres 102018-08-03T22:14:48Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Tranche_name<br />
! Query<br />
! Running time<br />
|-<br />
| ADBA<br />
| Cc1cc(C(=O)O)cc(C)c1[N+](=O)[O-]<br />
| 1549.504 ms<br />
|-<br />
| BDCD<br />
| CC(C)OCC(=O)N1CCc2cccc(O)c2C1<br />
| 1049.261 ms<br />
|-<br />
| BDCD<br />
| CC(C)OCC(=O)N1CCc2cccc(O)c2C1<br />
| 1049.261 ms<br />
|-<br />
| BDCD<br />
| CC(C)OCC(=O)N1CCc2cccc(O)c2C1<br />
| 1049.261 ms<br />
|-<br />
| BDCD<br />
| CC(C)OCC(=O)N1CCc2cccc(O)c2C1<br />
| 1049.261 ms<br />
|-<br />
| BDCD<br />
| CC(C)OCC(=O)N1CCc2cccc(O)c2C1<br />
| 1049.261 ms<br />
|}</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10921How to do indexing, partition, and migration in Postgres 102018-08-03T22:14:08Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Tranche_name<br />
! Query<br />
! Running time<br />
|-<br />
| ADBA<br />
| Cc1cc(C(=O)O)cc(C)c1[N+](=O)[O-]<br />
| 1549.504 ms<br />
|-<br />
| row 2, cell 1<br />
| row 2, cell 2<br />
| row 2, cell 3<br />
|}</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10920How to do indexing, partition, and migration in Postgres 102018-08-03T22:13:25Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Tranche_name<br />
! Query<br />
! Running time<br />
|-<br />
| row 1, cell 1<br />
| row 1, cell 2<br />
| row 1, cell 3<br />
|-<br />
| row 2, cell 1<br />
| row 2, cell 2<br />
| row 2, cell 3<br />
|}</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10919How to do indexing, partition, and migration in Postgres 102018-08-03T22:12:29Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Header 1<br />
! Header 2<br />
! Header 3<br />
|-<br />
| row 1, cell 1<br />
| row 1, cell 2<br />
| row 1, cell 3<br />
|-<br />
| row 2, cell 1<br />
| row 2, cell 2<br />
| row 2, cell 3<br />
|}</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10918How to do indexing, partition, and migration in Postgres 102018-08-03T22:11:45Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
<pre><br />
ADBA Cc1cc(C(=O)O)cc(C)c1[N+](=O)[O-] 1549.504 ms<br />
BDCD CC(C)OCC(=O)N1CCc2cccc(O)c2C1 1049.261 ms<br />
CAGE NNC(=O)c1nnn(Cc2ccccc2)c1C(=O)NN 971.508 ms<br />
DCEB O=C1c2nc[nH]c2C(=O)C(SCCO)=C1SCCO 882.626 ms<br />
ECAD C[C@@H](CCNC(=O)C(C)(C)F)NC(=O)[C@H]1CCc2nncn2CC1 1001.437 ms<br />
KDED COc1ccc(C=c2sc(=C(C#N)C(=O)N3CCCC3)n(CCN3CCOCC3)c2=O)cc1OCc1ccccc1 1043.976 ms<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10917How to do indexing, partition, and migration in Postgres 102018-08-03T22:11:08Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre><br />
<br />
Performance of searching with partition, indexing, and parallel query.<br />
<br />
<pre><br />
ADBA Cc1cc(C(=O)O)cc(C)c1[N+](=O)[O-] 1549.504 ms<br />
BDCD CC(C)OCC(=O)N1CCc2cccc(O)c2C1 1049.261 ms<br />
CAGE NNC(=O)c1nnn(Cc2ccccc2)c1C(=O)NN 971.508 ms<br />
DCEB O=C1c2nc[nH]c2C(=O)C(SCCO)=C1SCCO 882.626 ms<br />
ECAD C[C@@H](CCNC(=O)C(C)(C)F)NC(=O)[C@H]1CCc2nncn2CC1 1001.437 ms<br />
KDED COc1ccc(C=c2sc(=C(C#N)C(=O)N3CCCC3)n(CCN3CCOCC3)c2=O)cc1OCc1ccccc1 1043.976 ms<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10916How to do indexing, partition, and migration in Postgres 102018-08-03T22:10:11Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre><br />
<br />
== '''4. Search the database''' ==<br />
<br />
Directly perform search on the partition table (parent). Parallel query will be automatically used in searching.<br />
<pre><br />
explain select * from smile_partition where smiles='COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC';<br />
<br />
Output:<br />
<br />
QUERY PLAN <br />
------------------------------------------------------------------------------------------------<br />
Append (cost=0.28..25863.85 rows=3600 width=250)<br />
-> Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaab on smile_partition_aaab (cost=0.28..8.30 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaad on smile_partition_aaad (cost=0.29..8.30 rows=1 width=125)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
-> Index Scan using index_aaae on smile_partition_aaae (cost=0.42..8.44 rows=1 width=106)<br />
Index Cond: ((smiles)::text = 'COc1cc(C(=O)N2CCOCC2)cc(OS(=O)(=O)O)c1OC'::text)<br />
...<br />
...<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10915How to do indexing, partition, and migration in Postgres 102018-08-03T22:06:37Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10914How to do indexing, partition, and migration in Postgres 102018-08-03T22:06:21Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10913How to do indexing, partition, and migration in Postgres 102018-08-03T22:06:03Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10912How to do indexing, partition, and migration in Postgres 102018-08-03T22:05:45Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10911How to do indexing, partition, and migration in Postgres 102018-08-03T22:05:33Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10910How to do indexing, partition, and migration in Postgres 102018-08-03T22:05:20Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). <br />
Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10909How to do indexing, partition, and migration in Postgres 102018-08-03T22:04:56Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre><br />
<br />
3. Check by doing a query search.<br />
<br />
<pre><br />
select * from smile_partition_aaaa where smiles='CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1';<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------<br />
Index Scan using index_aaaa on smile_partition_aaaa (cost=0.28..8.30 rows=1 width=104)<br />
Index Cond: ((smiles)::text = 'CS(=O)(=O)N1CCC[C@@H](C(=O)N2CCC3(CC2)OCCO3)C1'::text)<br />
(2 rows)<br />
<br />
You should see "Index Scan" indicating that Postgres is using indexing in searching. If you don't see this, check the validity of index.<br />
You can drop and recreate index on individual partition table (child). Another pro of partition, if something wrong with one partition table (child), just modify this one.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10908How to do indexing, partition, and migration in Postgres 102018-08-03T22:01:23Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre><br />
<br />
<br />
== '''3. Create Index on Partition Tables''' ==<br />
<br />
Indexing can greatly speedup searching. There are several types of index in Postgres. In this work, I am using the default BTREE (Binary Tree) index. In the future, we may leverage RDKit for indexing.<br />
An automate script to generate index for all partition tables is /var/lib/pgsql/script/create_index_all.sh<br />
<br />
<pre><br />
CREATE INDEX IF NOT EXISTS $indexname ON $tablename(smiles); # indexname is like 'idx_abed', tablename is like 'smile_partition_kkie'.<br />
</pre><br />
<br />
Check Validity of index<br />
1. Check the number of index is the same as the number of partition tables (child, no index for the parent table).<br />
2. Check the details of partition table. e.g.<br />
<br />
<pre><br />
\d smile_partition_aaaa<br />
Table "public.smile_partition_aaaa"<br />
Column | Type | Collation | Nullable | Default <br />
--------------+------------------------+-----------+----------+---------<br />
smiles | character varying(256) | | not null | <br />
zinc_id | character varying(128) | | not null | <br />
inchikey | character varying(128) | | | <br />
mwt | real | | not null | <br />
logp | real | | not null | <br />
reactive | real | | not null | <br />
purchase | integer | | not null | <br />
tranche_name | character varying(32) | | not null | <br />
feature | text | | | <br />
Partition of: smile_partition FOR VALUES IN ('AAAA')<br />
Indexes:<br />
"index_aaaa" btree (smiles)<br />
<br />
If the index is invalid, following message will be shown.<br />
Indexes:<br />
"index_aaaa" btree (smiles) INVALID<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10907How to do indexing, partition, and migration in Postgres 102018-08-03T21:40:49Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables and Load Data into Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10906How to do indexing, partition, and migration in Postgres 102018-08-03T21:39:38Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
<br />
<pre><br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10905How to do indexing, partition, and migration in Postgres 102018-08-03T21:39:03Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre><br />
<br />
(3) Load data to the table. <br />
<pre><br />
copy smile_partition from '$i' with CSV HEADER Delimiter AS E'\t'; # copy to the partition (parent) table. Data will be automatically load to corresponding child table. Don't copy data to partition (child) table.<br />
</pre><br />
<br />
(4) Check created tables.<br />
\d<br />
<br />
Output:<br />
<br />
List of relations<br />
Schema | Name | Type | Owner <br />
--------+----------------------+-------+----------<br />
public | smile_partition | table | postgres<br />
public | smile_partition_aaaa | table | postgres<br />
public | smile_partition_aaab | table | postgres<br />
public | smile_partition_aaad | table | postgres<br />
public | smile_partition_aaae | table | postgres<br />
public | smile_partition_aaaf | table | postgres<br />
public | smile_partition_aaba | table | postgres<br />
...<br />
...<br />
public | smile_partition_kkgf | table | postgres<br />
public | smile_partition_kkia | table | postgres<br />
public | smile_partition_kkib | table | postgres<br />
public | smile_partition_kkie | table | postgres<br />
public | smile_partition_kkif | table | postgres<br />
(3601 rows)</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10904How to do indexing, partition, and migration in Postgres 102018-08-03T21:34:32Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix'); # $tablename is like smile_partition_aaaa, smile_partition_ebca; '$prefix' is like 'EEBD', 'AAAA'<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10903How to do indexing, partition, and migration in Postgres 102018-08-03T21:30:41Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
An automate script is /var/lib/pgsql/script/create_partition_all.sh<br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix');<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10902How to do indexing, partition, and migration in Postgres 102018-08-03T21:29:37Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
<br />
(1) Create partition table (parent). Partition on trache_name, a natural way to partition the dataset.<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre><br />
<br />
(2) Create partition table(child). <br />
<pre><br />
CREATE TABLE IF NOT EXISTS $tablename PARTITION OF smile_partition FOR VALUES IN ('$prefix');<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10901How to do indexing, partition, and migration in Postgres 102018-08-03T21:22:19Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre><br />
<br />
<br />
== '''2. Create Partition Tables''' ==<br />
<br />
In Postgres, a table can be partitioned by certain attributes. Partition speedup search, because only need to focus on a subset of data. We can modify a partition (child) table without effecting the whole table. <br />
<br />
(1) Create partition table (parent)<br />
<pre><br />
CREATE TABLE IF NOT EXISTS smile_partition(<br />
smiles varchar(256) NOT NULL, <br />
zinc_id varchar(128) NOT NULL, <br />
inchikey varchar(128), <br />
mwt REAL NOT NULL, <br />
logp REAL NOT NULL, <br />
reactive REAL NOT NULL, <br />
purchase INT NOT NULL, <br />
tranche_name varchar(32) NOT NULL, <br />
feature text) PARTITION BY LIST(tranche_name);<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10900How to do indexing, partition, and migration in Postgres 102018-08-03T21:04:25Z<p>Jizhou: /* 1. Migrate data directory */</p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Change data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10899How to do indexing, partition, and migration in Postgres 102018-08-03T21:03:45Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre><br />
<br />
(7) Restart psql and check data directory again.<br />
<pre><br />
postgres=# SHOW data_directory;<br />
<br />
Output<br />
data_directory <br />
-------------------------<br />
/ssd/disk1/psql_10_data<br />
(1 row)<br />
<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10898How to do indexing, partition, and migration in Postgres 102018-08-03T21:01:52Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
<br />
Job for postgresql-10.service failed because the control process exited with error code. See "systemctl status postgresql-10.service" and "journalctl -xe" for details.<br />
<br />
journalctl -xe # check the error message<br />
There is a ******.lock file block access to the PostgreSQL service. Change the ownership or access roles of this file to solve this problem.<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10897How to do indexing, partition, and migration in Postgres 102018-08-03T20:56:59Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting, you may encounter the following error message:<br />
● postgresql-10.service - PostgreSQL 10 database server<br />
Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled)<br />
Active: failed (Result: exit-code) since Tue 2018-07-24 13:04:11 PDT; 38s ago<br />
Docs: https://www.postgresql.org/docs/10/static/<br />
Process: 23952 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)<br />
Process: 23946 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)<br />
Main PID: 23952 (code=exited, status=1/FAILURE)<br />
<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Starting PostgreSQL 10 database server...<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] HINT: Is...y.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og postmaster[23952]: 2018-07-24 13:04:11.944 PDT [23952] LOG: cou...se<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service: main process exited, code=ex...URE<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Failed to start PostgreSQL 10 database server.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: Unit postgresql-10.service entered failed state.<br />
Jul 24 13:04:11 yod.cluster.ucsf.bkslab.og systemd[1]: postgresql-10.service failed.<br />
Hint: Some lines were ellipsized, use -l to show in full.<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10896How to do indexing, partition, and migration in Postgres 102018-08-03T20:56:03Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre><br />
<br />
(6) Restart PostgreSQL service<br />
<pre><br />
sudo systemctl start postgresql-10<br />
<br />
Trouble shooting:<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10895How to do indexing, partition, and migration in Postgres 102018-08-03T20:47:53Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre><br />
<br />
(4) Copy current data and configure files to new folder<br />
<pre><br />
cp -R /var/lib/pgsql/10/data/* /ssd/disk1/psql_10_data/<br />
</pre><br />
<br />
(5) Point to the new data location<br />
<pre><br />
vim /var/lib/pgsql/10/data/postgresql.conf # Edit the configuration file in old folder.<br />
<br />
...<br />
data_directory = '/ssd/disk1/psql_10_data' # use data in another directory<br />
...<br />
<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10894How to do indexing, partition, and migration in Postgres 102018-08-03T20:40:14Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre><br />
<br />
(2) Quit all Postgres Processes and terminate Postgres service.<br />
<pre><br />
\q # quit from psql<br />
sudo systemctl stop postgresql-10 # stop postgres<br />
sudo systemctl status postgresql-10 # check its status<br />
</pre><br />
<br />
(3) Create new folder in destination directory<br />
<pre><br />
cd /ssd/disk1 <br />
mkdir psql_10_data # create a new folder<br />
chown postgres psql_10_data # Postgres requires exclusive ownership and access to the data directory. <br />
chmod 700 psql_10_data # Change read, write, and execute authority of this folder. No group or world access to this folder. Required by Postgres.<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10893How to do indexing, partition, and migration in Postgres 102018-08-03T20:22:48Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== '''1. Migrate data directory''' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10892How to do indexing, partition, and migration in Postgres 102018-08-03T20:22:24Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
== 1. Migrate data directory' ==<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10891How to do indexing, partition, and migration in Postgres 102018-08-03T20:21:24Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
''1. Migrate data directory''<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output:<br />
<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10890How to do indexing, partition, and migration in Postgres 102018-08-03T20:21:04Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
''1. Migrate data directory''<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre><br />
<br />
<pre><br />
Output<br />
data_directory <br />
-------------------------<br />
/var/lib/pgsql/10/data<br />
(1 row)<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_indexing,_partition,_and_migration_in_Postgres_10&diff=10889How to do indexing, partition, and migration in Postgres 102018-08-03T20:19:03Z<p>Jizhou: Created page with "This tutorial shows how to do data migration, partition, and indexing in Postgres 10. ''1. Migrate data directory'' The dataset is quite large, and it will take up about 220..."</p>
<hr />
<div>This tutorial shows how to do data migration, partition, and indexing in Postgres 10.<br />
<br />
''1. Migrate data directory''<br />
<br />
The dataset is quite large, and it will take up about 220GB of hard disk once loaded and indexed into Postgres. It is better to move the database storage to another large disk instead of the default (root) one. A Solid State Disk (SSD) is preferred for faster disk access.<br />
<br />
(1) Check current Postgres data directory.<br />
<pre><br />
# log into Postgres<br />
<br />
sudo -i <br />
su - postgres<br />
psql<br />
SHOW data_directory;<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=Category:Tutorials&diff=10888Category:Tutorials2018-08-03T19:57:46Z<p>Jizhou: </p>
<hr />
<div>Our concept of "tutorials" is that they explain step by step HOW to do something, but they do not dwell on the WHY, which we handle in [[:Category:Theory |theory]] pages. We also offer [[:Category:Manual | manuals]], which attempt to explain specific programs, databases or websites, without the HOW or the WHY.<br />
<br />
At the bottom of this page are pages that are tagged as "tutorials". <br />
<br />
Here is a list of nerdy topics that have not yet had the tutorial tag added. hint hint.<br />
<br />
* [[Using local Subversion Repository (SVN)]]<br />
* [[db2multipdb.py|db2multipdb.py How to decode .db files]]<br />
* [[Travel Depth|How to run Travel Depth analysis on the lab machines]]<br />
* [[pymol_background|How to make your PyMOL background transparent]]<br />
* [[Chembl2pdb|How to link the protein targets in ChEMBL to their PDB structures]]<br />
* [[Inspecting electron density maps]]<br />
* [[How to rsync remotely to the cluster]]<br />
* [[How to install and configure R Shiny]]<br />
* [[How to install and configure JupyterHub]]<br />
* [[How to do parallel search of smi files on the cluster]]<br />
* [[How to do indexing, partition, and migration in Postgres 10]]<br />
* [http://wiki.uoft.bkslab.org/index.php/Tools_for_protein_and_ligand_analysis Oliv's favorite tools for protein and ligand analysis]<br />
<br />
[[Category:Article type]]<br />
[[Category:Organization]]</div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10873How to do parallel search of smi files on the cluster2018-07-19T18:02:40Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.<br />
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. <br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10872How to do parallel search of smi files on the cluster2018-07-19T18:02:36Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.<br />
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. <br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10871How to do parallel search of smi files on the cluster2018-07-19T18:02:29Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.<br />
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. <br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10870How to do parallel search of smi files on the cluster2018-07-19T18:02:21Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.<br />
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. <br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10869How to do parallel search of smi files on the cluster2018-07-19T18:01:57Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.<br />
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. <br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run .../qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
.../qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10868How to do parallel search of smi files on the cluster2018-07-19T18:00:55Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.<br />
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. <br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
.../qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
.../ex7/2D/CD/CDAA.smi<br />
.../ex7/2D/CD/CDAB.smi<br />
.../ex7/2D/CD/CDAC.smi<br />
.../ex7/2D/CD/CDAD.smi<br />
.../ex7/2D/CD/CDAE.smi<br />
.../ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run .../qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
.../qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10867How to do parallel search of smi files on the cluster2018-07-19T17:57:50Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
.../qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
.../ex7/2D/CD/CDAA.smi<br />
.../ex7/2D/CD/CDAB.smi<br />
.../ex7/2D/CD/CDAC.smi<br />
.../ex7/2D/CD/CDAD.smi<br />
.../ex7/2D/CD/CDAE.smi<br />
.../ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre><br />
<br />
<br />
'''Clean up'''<br />
<br />
To clean up, run .../qsub-mr --clean. The outputs directory and its files will be removed.<br />
<pre><br />
.../qsub-mr --clean<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10866How to do parallel search of smi files on the cluster2018-07-19T17:55:08Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
.../qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
.../ex7/2D/CD/CDAA.smi<br />
.../ex7/2D/CD/CDAB.smi<br />
.../ex7/2D/CD/CDAC.smi<br />
.../ex7/2D/CD/CDAD.smi<br />
.../ex7/2D/CD/CDAE.smi<br />
.../ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10865How to do parallel search of smi files on the cluster2018-07-19T17:54:55Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
.../qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
.../ex7/2D/CD/CDAA.smi<br />
.../ex7/2D/CD/CDAB.smi<br />
.../ex7/2D/CD/CDAC.smi<br />
.../ex7/2D/CD/CDAD.smi<br />
.../ex7/2D/CD/CDAE.smi<br />
.../ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10864How to do parallel search of smi files on the cluster2018-07-19T17:54:38Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
.../qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
.../ex7/2D/CD/CDAA.smi<br />
.../ex7/2D/CD/CDAB.smi<br />
.../ex7/2D/CD/CDAC.smi<br />
.../ex7/2D/CD/CDAD.smi<br />
.../ex7/2D/CD/CDAE.smi<br />
.../ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre><br />
<br />
<br />
'''merge.sh'''<br />
<br />
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below<br />
<pre><br />
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6<br />
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6<br />
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6<br />
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0<br />
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0<br />
...<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10863How to do parallel search of smi files on the cluster2018-07-19T17:46:26Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10862How to do parallel search of smi files on the cluster2018-07-19T17:45:59Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qsub tutorial]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10861How to do parallel search of smi files on the cluster2018-07-19T17:45:28Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html]<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre></div>Jizhouhttp://wiki.docking.org/index.php?title=How_to_do_parallel_search_of_smi_files_on_the_cluster&diff=10860How to do parallel search of smi files on the cluster2018-07-19T17:44:19Z<p>Jizhou: </p>
<hr />
<div>This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu<br />
<br />
'''Create a folder with the following files and scripts'''<br />
<pre><br />
SUBMIT.sh<br />
input.txt<br />
search_smi.sh<br />
merge.sh<br />
</pre><br />
<br />
'''SUBMIT.sh'''<br />
<br />
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.<br />
<pre><br />
#!/bin/bash<br />
<br />
/nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command<br />
-l 5 \ # The number of lines to be handled by each task, here is 5<br />
-N test \ # The name of the queue to submit to<br />
input.txt \ # The input file names and directory<br />
./search_smi.sh \ # The searching function to be performed <br />
-q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching<br />
</pre><br />
<br />
<br />
'''input.txt'''<br />
<br />
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.<br />
<pre><br />
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi<br />
/nfs/home/jizhou/ex7/2D/CD/CDAF.smi<br />
...<br />
</pre><br />
<br />
<br />
'''search_smi.sh'''<br />
<br />
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below<br />
<pre><br />
-q "CS(=O)(=O)CCNCc1ccccc1"<br />
</pre><br />
<br />
<br />
'''run SUBMIT.sh'''<br />
<br />
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job.<br />
<br />
<pre><br />
qstat # check the status of jobs, example is shown below.<br />
<br />
-bash-4.1$ qstat<br />
job-ID prior name user state submit/start at queue slots ja-task-ID <br />
-----------------------------------------------------------------------------------------------------------------<br />
6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1<br />
6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2<br />
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3<br />
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4<br />
</pre></div>Jizhou