2016-06-29

Verify that a programme is communicating through proxy only

I had to verify, that a programme on a remote server is communicating through proxy only, while there were lots of other services on the server running (and communicating over the network). While I could watch proxy's (squid) logs, setup firewall to log access to and from certain hosts or use iftop. In my case these had various down-sides (e.g. iftop is more to track amount of traffic and I had to check that even the smallest packet wont bypass mine http proxy - even if you can pass filters mentioned below to iftop as well - see -f option). I have chosen tcpdump, and this post is to save exact command I have used:

tcpdump -i any "tcp and host not proxy.example.com and host not my-workstation.example.com and not ( dst localhost and src localhost ) and not ( dst $( hostname ) and src $( hostname ) )"
  • tcp says that I'm interested in TCP traffic only
  • host not proxy.example.com instructs tcpdump to ignore (should not log) any traffic to/from my proxy server
  • host not my-workstation.example.com asks tcpdump to ignore traffic to/from my workstation as I'm connected via ssh from there (it could be hardened to only ignore ssh traffic - port 22, but this is good enough for me)
  • not ( dst localhost and src localhost ) ignore traffic going from localhost to localhost (some other services on the system are talking to each other and I'm not interested in it)
  • not ( dst $( hostname ) and src $( hostname ) ) same as above, but some services are using my external IP for their internal discussions and again, I do not need to know about that

This way tcpdump only logs communication from/to parts of external world I'm interested about.

2016-06-02

Difference in Spacewalk's API and almost dirrect SQL performance

Imagine you want to get list of hosts registered to your Spacewalk, ideally with groups they are registered to and you want to do it repeatedly, so performance matters. Lets measure it.

I have Spacewalk 2.4 on a 2 CPU virtual system with 4 GB or RAM (Virtual, really? Not ideal for perf measurement, I know.) and I have created 1000 system profiles on it. There are 2 ways how to get the data out of the Server: command-line spacewalk-report inventory utility (needs to be run on a system running Spacewalk, queries directly the database) or system API (can be ran from anywhere, but data have to go from DB through spacewalk's Java stack and to XML which is then transferred to you over the network). API script to measure can look like this (well, this does not output obtained data):

#!/usr/bin/env python

import xmlrpclib
import time

server = xmlrpclib.Server('http://<fqdn>/rpc/api')
key = server.auth.login('<user>', '<pass>')
for i in range(100):
  before = time.time()
  systems = server.system.listUserSystems(key)
  for s in systems:
    detail = server.system.getNetwork(key, s['id'])
    groups = server.system.listGroups(key, s['id'])
  after = time.time()
  print "%s %s %s %s" % (len(systems), before, after, after-before)
server.auth.logout(key)

Here are mine results (averages from 100 repetitions performed directly after spacewalk-service restart):

method average duration note
spacewalk-report inventory 1.4 seconds Needs to run directly on Spacewalk
API with system.listUserSystems() only 0.9 seconds Provides systm ID and profile name only (does not equal to hostname)
API with system.listUserSystems() and system.getNetwork() 23.8 seconds Gives you IP and hostname
API with system.listUserSystems() and system.getDetails() 27.5 seconds Gives plenty of info, including hostname, but not groups
API with system.listUserSystems(), system.getNetwork() and system.listGroups() 52.4 seconds Finally, ths one gathers hostname and system groups

So, depends on what you want to achieve and how often do you want to run the script. Also, in API script case, you have to keep login (or logins when you need to run for multiple organizations) somewhere. Fortunatelly you can use read-only API user for this.