WARNING: THIS IS NOT A POINT-AND-CLICK SETUP This instructions below are only a sketch; you will have to customize the HIVE scripts to reliably operate on your own systems and to properly suit your needs. The use and distribution of HIVE is subject to the terms stated in the LICENSE.txt file. ********************************* * Honeypots setup ********************************* In our original setup we used VirtualBox to deploy the honeypots. You're free to use what you prefer (VMWare, Xen, physical machines), but you'll have to rewrite the management scripts. If you need a copy of the VirtualBox version we used (a pre-1.6.0 svn version) please ask, but we stringly suggest you first try with the latest available from Sun; there's no reason why it shouldn't work. We used a 512mb VM for HoneyWall and several 128mb machines for the honeypots. You will have to customize startnet.sh and resetvm.sh to your specific setup; to create the virtual network you will need Linux bridging support enabled in kernel (it usually is) and the bridge-utils userspace tools. To use our submit.py script to automatically send samples to the gateway, Python is required too. ********************************* * Gateway setup ********************************* We implemented the gateway using a simple PHP script to receive the samples and forward them to the DBMS. Two scripts are included: upload.php saves the samples on disk (if the gateway is on a different system you will have to copy them to the database machine, e.g. using sftp); upload2.php saves the samples directly to BLOB fields in the DBMS. Please consider these as reference implementations: we do not suggest to use these scripts unmodified in production. ********************************* * Database setup ********************************* You need a working PostgreSQL 8.3 cluster to proceed. If you run a Debian-based distribution, the required packages are: postgresql-common postgresql-8.3 postgresql-client-common postgresql-client-8.3 postgresql-plpython-8.3 You're also going to need some Python packages used by the various scripts and procedures: python-psycopg2 python-pyclamd (probably some more) On the ClamAV scanning server (which can be the same machine or another) you need clamav-base clamav-daemon clamav-freshclam and you have to enable the ClamAV daemon. HIVE uses a database to store its data; here's an example (which you will have to adapt to your own setup) on how to create and populate it: su - postgres createuser -DPE malware createdb -O malware -E UTF8 malware 'HIVE malware database' createlang -d malware plpgsql createlang -d malware plpythonu psql -U malware -d malware < hive-schema.sql Basically, we create a new user role and a new database (both named 'malware'); we then proceed to add the pl/pgSQL and pl/PythonU procedural languages to the created database and finally we import our SQL schema. Tip: since you're going to store all the collected data and samples on this system, do make sure you have plenty of disk space before going in production; setting up an automated backup system (either using pg_dump or replicating the cluster with Slony) is also strongly recommended. You will now have to customize some settings for your setup; start with the 'settings' table and edit the rows accordingly; they should be self-explanatory. Unfortunately, only a few scripts use this table: most still rely on hard-coded settings in the source code, which you'll have to modify. At the very leasy, you should check the database functions submit_to_anubis, submit_to_cwsandbox, update_cncs_geo and set_geo. The first two require the Python script httputils.py in your search path, the others the GeoLiteCity.dat file, which can be acquired from http://www.maxmind.com/app/geolitecity. Note: before going in production, we suggest you contact the external analysis providers you plan to use to let them know about the potential sudden load. By default, the system will feed them every non-trivial acquired sample; in case of a large infrastructure, you may want to throttle your submission (or consider bringing an analysis service in-house). You should then setup cron to periodically run imap2postgres.py and process_report.py (of course you have to set the correct parameters, such as the reports email address). Some handy utilities to interact with BLOB objects are provided (getsample.py, {get,set}report.py, import_{samples,report}.py); no warranty on those. ********************************* * Monitor setup ********************************* We provide two basic monitoring utilities: httpmole.py for HTTP botnets and a modified version of Jan Goebel infiltrator for IRC botnets. Both softwares are written in Python and require python-psycopg2 for database interaction; httpmole.py additionally requires python-magic (used to screen PE executables). On every run, httpmole.py pings every registered C&C and records its reply in the http_replies database table; it is designed to be run periodically (e.g. from cron). infiltrator is an interactive program: we suggest to run it inside screen(1). A typical session is described: Infiltrator Console v0.1 ? - for help >> db open connection successful >> db autostart (outputs list of connected cncs) the program will now start connecting to every C&C registered in the database, logging their output in the irc_commands table. The list of database-related commands follows: db autostart db listconfigurations db open db close db loadcnc db listcncs db loadconfiguration Our modifications to the stock infiltrator are provided as a unified diff in infiltrator-hive.diff.