Hacking Google Chrome’s Search Engine Settings for Efficiency
September 16, 2020How To: Object Detection — A fascinating application of AI
November 3, 2020Python is a very powerful language for processing large amounts of data. There are many times when there is a code change on lower environments that requires existing documents to be updated, so we need to do a data migration. We had previously used node.js for data migrations, but recently I have converted almost all of my migration work to Python. Not only does it seem to manage large amounts of data in memory better, but it also seems to be a very quick and easy way to handle the task of migrating or exporting large amounts of data.
One of the biggest hiccups I’ve had while writing migrations is using the Couchbase Client. Currently, there isn’t a great amount of documentation, and much of what I have found, does not work with the most recent version. Much of what was working prior to our Couchbase upgrade was broken and we needed to make sure everything was working well again. The first hurdle was getting the authorization working again. Here is a small code snippet of what we used to get it up and running.
from couchbase.cluster import Cluster, ClusterOptions, QueryOptions, ClusterTimeoutOptions
from couchbase.cluster import PasswordAuthenticator
couchbaseConnectionString = 'couchbase://' + yourCouchbaseIPAddress
# add optional timeout
ct = ClusterTimeoutOptions(query_timeout=timedelta(seconds=300))
cluster = Cluster(couchbaseConnectionString, ClusterOptions(PasswordAuthenticator('username', 'password'), timeout_options=ct ))
cb = cluster.bucket('cdh')cb.n1ql_timeout = 3600
Another hurdle I had to get over was getting the query count. I needed it so that I could easily stop processing since I know that there isn’t any more data. Below is how I accomplished this.
n1ql = "your n1ql query”
row_iter = cb.query(n1ql,QueryOptions(timeout=timedelta(seconds=300)))
#loop get result count from metrics
queryCount = row_iter.metrics['resultCount']
if queryCount > 0:
Python is very powerful and although there has been some learning to do along the way, I feel this is the best tool for the job when it comes to large data operations. I hope this is helpful to anyone working with Python and Couchbase and can get you to where you need to be with your scripts.