PyMongo Monday – Episode 3 – Read

20180927_173519

Previously we covered:
* Episode 1: Setting Up Your MongoDB Environment
* Episode 2: CRUD – Create

In this episode (episode 3) we are are going to cover the Read part of CRUD. MongoDB provides a query interface through the find function.We are going to demonstrate Read by doing find queries on a collection
hosted in MongoDB Atlas. The MongoDB
connection string is:

mongodb+srv://demo:demo@demodata-rgl39.mongodb.net/test?retryWrites=true

This is a cluster running a database called demo with a single collection called zipcodes. Every ZIP code in the US is in this database. To connect to this cluster we are going to use the Python shell.

$ cd ep003
$ pipenv shell
Launching subshell in virtual environment…
JD10Gen:ep003 jdrumgoole$  . /Users/jdrumgoole/.local/share/virtualenvs/ep003-blzuFbED/bin/activate
(ep003-blzuFbED) JD10Gen:ep003 jdrumgoole$ python
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from pymongo import MongoClient
>>> client = MongoClient(host="mongodb+srv://demo:demo@demodata-rgl39.mongodb.net/test?retryWrites=true")
>>> db = client["demo"]
>>> zipcodes=db["zipcodes"]
>>> zipcodes.find_one()
{'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'}
>>>

The find_one query will get the first record in the collection. You can see
the structure of the fields in the returned document. The _id is the ZIP code.
The city is the city name. The loc is the GPS coordinates of each ZIP code. The pop is the population size and the state is the two letter state code.
We are connecting with the default user demo with the password demo. This
user only has read-only access to this database and collection.

So what if we want to select all the ZIP codes for a particular city?

Querying in MongoDB consists of constructing a partial JSON document that matches the fields you want to select on. So to get all the ZIP codes in the city of PALMER we use the following query

>>> zipcodes.find({'city': 'PALMER'})

>>>

Note we are using find() rather than find_one() as we want to return
all the matching documents. In this case find() will return a
cursor.

To print the cursor contents just keep calling .next() on the cursor
as follows:

>>> cursor=zipcodes.find({'city': 'PALMER'})
>>> cursor.next()
{'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'}
>>> cursor.next()
{'_id': '37365', 'city': 'PALMER', 'loc': [-85.564272, 35.374062], 'pop': 1685, 'state': 'TN'}
>>> cursor.next()
{'_id': '50571', 'city': 'PALMER', 'loc': [-94.543155, 42.641871], 'pop': 1119, 'state': 'IA'}
>>> cursor.next()
{'_id': '66962', 'city': 'PALMER', 'loc': [-97.112214, 39.619165], 'pop': 276, 'state': 'KS'}
>>> cursor.next()
{'_id': '68864', 'city': 'PALMER', 'loc': [-98.241146, 41.178757], 'pop': 1142, 'state': 'NE'}
>>> cursor.next()
{'_id': '75152', 'city': 'PALMER', 'loc': [-96.679429, 32.438714], 'pop': 2605, 'state': 'TX'}
>>> cursor.next()
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/jdrumgoole/.local/share/virtualenvs/ep003-blzuFbED/lib/python3.6/site-packages/pymongo/cursor.py", line 1197, in next
    raise StopIteration
StopIteration

As you can see cursors follow the Python iterator protocol and will raise a StopIteration exception when the cursor is exhausted.

However, calling .next() continously is a bit of a drag. Instead you can
import the pymongo_shell package and call the print_cursor() function. It will print out twenty records at a time.

>>> from pymongo_shell import print_cursor
>>> print_cursor(zipcodes.find({'city': 'PALMER'}))
{'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'}
{'_id': '37365', 'city': 'PALMER', 'loc': [-85.564272, 35.374062], 'pop': 1685, 'state': 'TN'}
{'_id': '50571', 'city': 'PALMER', 'loc': [-94.543155, 42.641871], 'pop': 1119, 'state': 'IA'}
{'_id': '66962', 'city': 'PALMER', 'loc': [-97.112214, 39.619165], 'pop': 276, 'state': 'KS'}
{'_id': '68864', 'city': 'PALMER', 'loc': [-98.241146, 41.178757], 'pop': 1142, 'state': 'NE'}
{'_id': '75152', 'city': 'PALMER', 'loc': [-96.679429, 32.438714], 'pop': 2605, 'state': 'TX'}
>>>

If we don’t need all the fields in the doc we can use
projection to remove some fields. This is a second doc argument to the find() function. This doc can specify the fields to return explicitly.

>>> print_cursor(zipcodes.find({'city': 'PALMER'}, {'city':1,'pop':1}))
{'_id': '01069', 'city': 'PALMER', 'pop': 9778}
{'_id': '37365', 'city': 'PALMER', 'pop': 1685}
{'_id': '50571', 'city': 'PALMER', 'pop': 1119}
{'_id': '66962', 'city': 'PALMER', 'pop': 276}
{'_id': '68864', 'city': 'PALMER', 'pop': 1142}
{'_id': '75152', 'city': 'PALMER', 'pop': 2605}

To include multiple fields in a query just add them to query doc. Each field is
treated as a boolean and to select the documents that will be returned.

>>> print_cursor(zipcodes.find({'city': 'PALMER', 'state': 'MA'}, {'city':1,'pop':1}))
{'_id': '01069', 'city': 'PALMER', 'pop': 9778}
>>>

To pick documents with one field or the other we can use the $or operator.

>>> print_cursor(zipcodes.find({ '$or' : [ {'city': 'PALMER' }, {'state': 'MA'}]}))
{'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'}
{'_id': '01002', 'city': 'CUSHMAN', 'loc': [-72.51565, 42.377017], 'pop': 36963, 'state': 'MA'}
{'_id': '01012', 'city': 'CHESTERFIELD', 'loc': [-72.833309, 42.38167], 'pop': 177, 'state': 'MA'}
{'_id': '01073', 'city': 'SOUTHAMPTON', 'loc': [-72.719381, 42.224697], 'pop': 4478, 'state': 'MA'}
{'_id': '01096', 'city': 'WILLIAMSBURG', 'loc': [-72.777989, 42.408522], 'pop': 2295, 'state': 'MA'}
{'_id': '01262', 'city': 'STOCKBRIDGE', 'loc': [-73.322263, 42.30104], 'pop': 2200, 'state': 'MA'}
{'_id': '01240', 'city': 'LENOX', 'loc': [-73.271322, 42.364241], 'pop': 5001, 'state': 'MA'}
{'_id': '01370', 'city': 'SHELBURNE FALLS', 'loc': [-72.739059, 42.602203], 'pop': 4525, 'state': 'MA'}
{'_id': '01340', 'city': 'COLRAIN', 'loc': [-72.726508, 42.67905], 'pop': 2050, 'state': 'MA'}
{'_id': '01462', 'city': 'LUNENBURG', 'loc': [-71.726642, 42.58843], 'pop': 9117, 'state': 'MA'}
{'_id': '01473', 'city': 'WESTMINSTER', 'loc': [-71.909599, 42.548319], 'pop': 6191, 'state': 'MA'}
{'_id': '01510', 'city': 'CLINTON', 'loc': [-71.682847, 42.418147], 'pop': 13269, 'state': 'MA'}
{'_id': '01569', 'city': 'UXBRIDGE', 'loc': [-71.632869, 42.074426], 'pop': 10364, 'state': 'MA'}
{'_id': '01775', 'city': 'STOW', 'loc': [-71.515019, 42.430785], 'pop': 5328, 'state': 'MA'}
{'_id': '01835', 'city': 'BRADFORD', 'loc': [-71.08549, 42.758597], 'pop': 12078, 'state': 'MA'}
{'_id': '01845', 'city': 'NORTH ANDOVER', 'loc': [-71.109004, 42.682583], 'pop': 22792, 'state': 'MA'}
{'_id': '01851', 'city': 'LOWELL', 'loc': [-71.332882, 42.631548], 'pop': 28154, 'state': 'MA'}
{'_id': '01867', 'city': 'READING', 'loc': [-71.109021, 42.527986], 'pop': 22539, 'state': 'MA'}
{'_id': '01906', 'city': 'SAUGUS', 'loc': [-71.011093, 42.463344], 'pop': 25487, 'state': 'MA'}
{'_id': '01929', 'city': 'ESSEX', 'loc': [-70.782794, 42.628629], 'pop': 3260, 'state': 'MA'}
Hit Return to continue

We can do range selections by using the $lt and $gtoperators.

>>> print_cursor(zipcodes.find({'pop' : { '$lt':8, '$gt':5}}))
{'_id': '05901', 'city': 'AVERILL', 'loc': [-71.700268, 44.992304], 'pop': 7, 'state': 'VT'}
{'_id': '12874', 'city': 'SILVER BAY', 'loc': [-73.507062, 43.697804], 'pop': 7, 'state': 'NY'}
{'_id': '32830', 'city': 'LAKE BUENA VISTA', 'loc': [-81.519034, 28.369378], 'pop': 6, 'state': 'FL'}
{'_id': '59058', 'city': 'MOSBY', 'loc': [-107.789149, 46.900453], 'pop': 7, 'state': 'MT'}
{'_id': '59242', 'city': 'HOMESTEAD', 'loc': [-104.591805, 48.429616], 'pop': 7, 'state': 'MT'}
{'_id': '71630', 'city': 'ARKANSAS CITY', 'loc': [-91.232529, 33.614328], 'pop': 7, 'state': 'AR'}
{'_id': '82224', 'city': 'LOST SPRINGS', 'loc': [-104.920901, 42.729835], 'pop': 6, 'state': 'WY'}
{'_id': '88412', 'city': 'BUEYEROS', 'loc': [-103.666894, 36.013541], 'pop': 7, 'state': 'NM'}
{'_id': '95552', 'city': 'MAD RIVER', 'loc': [-123.413994, 40.352352], 'pop': 6, 'state': 'CA'}
{'_id': '99653', 'city': 'PORT ALSWORTH', 'loc': [-154.433803, 60.636416], 'pop': 7, 'state': 'AK'}
>>>

Again sets of $lt and $gt are combined as a boolean and. if you need different logic you can use the boolean operators.

Conclusion

Today we have seen how to query documents using a query template, how to reduce the output using projections and how to create more complex queries using boolean and $lt and $gt operators.

Next time we will talk about the Update portion of CRUD.

MongoDB has a very rich and full featured query language including support
for querying using full-text, geo-spatial coordinates and nested queries.
Give the query language a spin with the Python shell using the tools we
outlined above. The complete zip codes data set is publicly available for read
queries at the MongoDB URI:

mongodb+srv://demo:demo@demodata-rgl39.mongodb.net/test?retryWrites=true”

Try MongoDB Atlas via the free-tier
today. A free MongoDB cluster for your own personal use forever!

PyMongo Monday: Episode 1: Setting Up Your PyMongo Environment

20170404_090123
Front Square, Trinity College, Dublin

Welcome to PyMongo Monday. This is the first in a series of regular blog posts that will introduce developers to programming MongoDB using the Python programming language. It’s called PyMongo Monday because PyMongo is the name of the client library (in MongoDB speak we refer to it as a “driver”) we used to interact with the MongoDB Server. Monday because we aim to release each new episode on Monday.

To get started we need to install the toolchain that a typical MongoDB Python developer would expect to use.

Installing m

First up is m. Hard to find online unless your search for MongoDB m, m is a tool to manage and use multiple installations of the MongoDB Server in parallel. It is an invaluable tool if you want to try out the latest and greatest beta version but still continue mainline development on our current stable release.

The easiest way to install m is with npm the Node.js package manager (which it turns out is not just for Node.js).

$ npm install -g m
Password:******
/usr/local/bin/m -> /usr/local/lib/node_modules/m/bin/m
+ m@1.4.1
updated 1 package in 2.361s
$

If you can’t or don’t want to use npm you can download and install directly from the github repo. See the README there for details.

For today we will use m to install the current stable production version (4.0.2 at the time of writing).

We run the stable command to achieve this.

$ m stable
MongoDB version 4.0.2 is not installed.
Installation may take a while. Would you like to proceed? [y/n] <b>y</b>
... installing binary

######################################################################## 100.0%
/Users/jdrumgoole
... removing source
... installation complete
$

If you need to use the path directly in another program you can get that with m bin.

$ m bin 4.0.0
/usr/local/m/versions/4.0.1/bin
$

To run the corresponding binary do m use stable

$ m use stable
2018-08-28T11:41:48.157+0100 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2018-08-28T11:41:48.171+0100 I CONTROL [initandlisten] MongoDB starting : pid=38524 port=27017 dbpath=/data/db 64-bit host=JD10Gen.local
2018-08-28T11:41:48.171+0100 I CONTROL [initandlisten] db version v4.0.2
2018-08-28T11:41:48.171+0100 I CONTROL [initandlisten] git version: fc1573ba18aee42f97a3bb13b67af7d837826b47
<b><i>&lt other server output &gt</i></b>
<b>...</b>
2018-06-13T15:52:43.648+0100 I NETWORK [initandlisten] waiting for connections on port 27017

Now that we have a server running we can confirm that it works by connecting via the
mongo shell.

$ mongo
MongoDB shell version v4.0.0
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.0
Server has startup warnings:
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten]
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten]
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** WARNING: This server is bound to localhost.
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** Remote systems will be unable to connect to this server.
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** Start the server with --bind_ip &lt address&gt to specify which IP
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** addresses it should serve responses from, or with --bind_ip_all to
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** bind to all interfaces. If this behavior is desired, start the
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten] ** server with --bind_ip 127.0.0.1 to disable this warning.
2018-07-06T10:56:50.973+0100 I CONTROL [initandlisten]

---
Enable MongoDB's free cloud-based monitoring service to collect and display
metrics about your deployment (disk utilization, CPU, operation statistics,
etc).

The monitoring data will be available on a MongoDB website with a unique URL created for you. Anyone you share the URL with will also be able to view this page. MongoDB may use this information to make product improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
---

>

These warnings are standard. They flag that this database has no access controls set up by default andthat it is only listening to connections coming from the machine it is running on (localhost). We will learn how to set up access control and listen on a broader range of ports in later episodes.

Installing the PyMongo Driver

But this series is not about the MongoDB Shell, which uses JavaScript as its coin of the realm, it’s about Python. How do we connect to the database with Python?

First, we need to install the MongoDB Python Driver, PyMongo. In MongoDB parlance a driver is a language-specific client library used to allow developers to interact with the server in the idiom of their own programming language.

For Python that means the driver is installed using pip. In node.js the driver is installed using npm and in Java you can use maven.

$ pip3 install pymongo
Collecting pymongo
Downloading https://files.pythonhosted.org/packages/a1/e0/51df08036e04c1ddc985a2dceb008f2f21fc1d6de711bb6cee85785c1d78/pymongo-3.7.1-cp27-cp27m-macosx_10_13_intel.whl (333kB)
100% |████████████████████████████████| 337kB 4.1MB/s
Installing collected packages: pymongo
Successfully installed pymongo-3.7.1
$

We recommend you use a virtual environment to isolate your PyMongo Monday code. This is not required but is very convenient for isolating different development streams.

Now we can connect to the database:

$ python
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> client = pymongo.MongoClient(host="mongodb://localhost:8000")
>>> result = client.admin.command("isMaster")
>>> import pprint
>>> pprint.pprint(result)
{'ismaster': True,
'localTime': datetime.datetime(2018, 6, 13, 21, 55, 2, 272000),
'logicalSessionTimeoutMinutes': 30,
'maxBsonObjectSize': 16777216,
'maxMessageSizeBytes': 48000000,
'maxWireVersion': 6,
'maxWriteBatchSize': 100000,
'minWireVersion': 0,
'ok': 1.0,
'readOnly': False}
>>>

First we import the PyMongo library (line 5). The we create a local client object (line 6) that holds the connection pool and other status for this server. We generally don’t want more than one MongoClient object per program as it provides its own connection pool.

Now we are ready to issue a command to the server. In this case its the standard MongoDB server information command which is called rather anachronistically isMaster (line 7). This is a hangover from the very early versions of MongoDB. It appears in pre 1.0 versions of MongoDB ()which is over ten years old at this stage).

The isMaster command returns a dict which details a bunch of server information. In order to format this in a more readable way import the pprint library.

#Conclusion
That’s the end of episode one. We have installed MonogDB, installed the Python client library (aka driver),started a mongod server and established a connection between the client and server.

Next week we will introduce CRUD operations on MongoDB starting with Create.

For direct feedback please pose your questions on twitter/jdrumgoole that way everyone can see the answers.

The best way to try out MongoDB is via MongoDB Atlas our Database as a Service. You can deploy a free cluster without giving us a credit card.