This short article/how-to will show you how to solve a real issue. Our PostgreSQL logs were being stripped of trailing C-style comments containing a GDPR-relevant ID. We weren’t sure where this data was being stripped, but there were only a couple of options: In the Rails app, PgBouncer, or it was being stripped by PostgreSQL itself before logging.
Our task was to look at the traffic between the Rails app and PgBouncer on a remote machine without a graphical environment. The idea was to build the command using tshark until we had our data filtered out. Here’s how we did it.
Have the Rails app running in a bare-bones Debian-based Docker container (no X Window System). The app leverages a PostgreSQL database and the connections to it are managed by PgBouncer, which also resides in the container, listening on it’s default port 6432.
Side note about the output examples
The examples you’ll see are snippets from capturing real network traffic, with sensitive data changed or redacted, to remain GDPR-compliant ;). While you should achieve similar results in most steps, the actual numbers (such as byte lengths, IP addresses, TCP source ports etc…) will almost surely differ on your machine.
Look at the traffic between the Rails app and PgBouncer (which should both be running in a Docker container). To be able to process the output with other tools and easily run it in multiple remote environments at once, use a command line-only tool with a scrapeable output. Don’t redeploy for enhanced logging and don’t affect the system in any way.
Looking at the traffic
The only tool you’ll need to install is
tshark, which should also install
dumpcap. You will, of course, also need access to your container of choice.
apt-get install tshark
Step 1 - Identify the Interface
When you run
tshark without any arguments, it tries to listen to all available interfaces. As both the Puma server running our Rails app and PgBouncer are running locally in the container, the local
lo interface should suffice. Please note the name of the interface largely depends - it can be
lo, or something else. To see a list of available interfaces, you can run
dumpcap -D (dumpcap should be installed for you as a part of installing
tshark -i lo
This limits the capture to packets on the local interface (which, in our case, was still a LOT).
Step 2 - Constrain the Capture
To limit what you actually capture, write a filter specifying which packets you’re interested in.
To see the capabilities of capture filters in wireshark, please consult the Capture Filter Manual.
As PgBouncer listens on port 6432 by default, I’ll use that for this example and constrain the capture by the destination port.
tshark -i lo -f 'tcp dst port 6432'
Step 3a - Analyze the TCP Payload
Chances are, you’re still seeing a lot of packets you don’t really care about (e.g. TCP SYN/ACKs).
We can limit the output further, and tell
tshark to only show packets where the data is of a certain format -
PGSQL, in our case.
tshark -i lo -f 'tcp dst port 6432' -O PGSQL
tshark which output we’re interested in. You can see all of the available protocols with
tshark -G protocols.
You should see continuous output similar to this:
Frame 143: 764 bytes on wire (6112 bits), 764 bytes captured (6112 bits) on interface 0 Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00) Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1 Transmission Control Protocol, Src Port: 36540, Dst Port: 6432, Seq: 8345, Ack: 14949, Len: 698 Data (698 bytes)
Step 3b - Telling tshark how to Interpret the Data
This is where things get weird. Obviously,
tshark can tell the difference between packets with PGSQL data, and the other ones. However, the actual output (
Data (X bytes)) is definitely underwhelming. We can do better - we can tell
tshark to decode the data according to some protocol spec. More specifically, we tell
tshark to interpret any data from/to TCP port 6432 as PGSQL data using the option
-d option has a syntax of it’s own, and I’ll refer you to tshark documentation for the gory details.
tshark -i lo -f 'tcp dst port 6432' -O PGSQL -d 'tcp.port==6432,pgsql'
Running the above will now output:
Frame 477: 863 bytes on wire (6904 bits), 863 bytes captured (6904 bits) on interface 0 Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00) Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1 Transmission Control Protocol, Src Port: 54548, Dst Port: 6432, Seq: 2020, Ack: 2174, Len: 797 PostgreSQL PostgreSQL PostgreSQL PostgreSQL PostgreSQL
– which is so NOT what we wanted.
Step 3c - Telling tshark how to Present the Data
You can tell
tshark how to format the data using the
-T option, but you want to output fields that can be filtered. In your case, you’ll want to use
-T fields, to be able to pick specific fields to view. Understand, though, that you need to explicitly specify the fields in this case, using the option
-e. To see the possible fields you can filter for, use
tshark -G fields and
pgsql. This command will give you a long, detailed description of the recognized fields and their respective formats. The one we want is
pgsql.query, so we need to add
-e pgsql.query to the command line.
The final command:
tshark -i lo -f 'tcp dst port 6432' -O PGSQL -d 'tcp.port==6432,pgsql' -T fields -e pgsql.query
Running the above will now output the exact SQL queries being sent from our app to PgBouncer:
SELECT "devices".* FROM "devices" WHERE "devices"."hw_code" = '46260748-d85e-4d77-ab61-9256e4876122' ORDER BY "devices"."id" ASC LIMIT 1 /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ BEGIN /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ INSERT INTO "devices" ("hw_code", "created_at", "updated_at") VALUES ('5e225af4-8d0f-420a-b533-32b47b1f0ea8', '2018-09-10 08:21:30.200392', '2018-09-10 08:21:3 0.200392') RETURNING "id" /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ COMMIT /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ SELECT "clients".* FROM "clients" WHERE "clients"."device_id" = 'e982b218-9b52-478c-a253-66833975effe' AND "clients"."session_identifier" = 'be91b493-b41a-4dc f-adc4-c458f5910065' AND "clients"."master_id" IS NULL ORDER BY "clients"."id" ASC LIMIT 1 /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ BEGIN /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ INSERT INTO "clients" ("device_id", "created_at", "updated_at", "session_identifier") VALUES ('5b0019c8-1328-45d8-a17d-137f07ef8839', '2018-09-10 08:21:30.2074 85', '2018-09-10 08:21:30.207485', '0e683b6e-67ec-4316-8714-7fe3edfacb19') RETURNING "id" /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ COMMIT /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */ SELECT "assignments".* FROM "assignments" WHERE "assignments"."device_id" = 'c0250684-24f0-4878-8f28-35158a0fc310' /* 10f35748-137b-4aae-bf5f-9108be4ddf9c */
In the above output, you can see that the C-style comments are present on the way from our client to PgBouncer. In our real issue, they were not - thus, narrowing down the culprit to our client.
Bonus Round - Back to the GUI!
Random proprietary OS user: “But I looove my GUIs! Why can’t I just use Wireshark?”
Well, it turns out you can. Just have
tshark (which provides
dumpcap) installed on your container, and then run this locally:
wireshark -k -i <(docker exec <CONTAINER_NAME> dumpcap -i lo -P -w - -f 'tcp dst port 6432')
The above runs
dumpcap on the container you specify, having it write to standard output. This output is then piped to Wireshark on your local machine, and from there it’s just like a local capture - you can use all the cool features of Wireshark just like if you’d been capturing local traffic.