Recently, I have been wanting to start another programming project in Python. I also wanted to play around with Cisco’s Netflow implementation. So I build my own flow collector in Python for educational purpose. For now it’s only limited to version 5 but I’m planning to add additional updates to include v9 and IPFIX, which is a bit more complicated.
Code is available on GitHub if anyone is interested.
Netflow v5 packet is made off of two components; a header and a record.
Netflow Header is 24 bytes long, containing meta data regarding exporting process and encapsulated flow record information including the number of flows encapsulated.
Record is a 48 bytes field where the flow information is encapsulated. Flow information is generated by the exporter and represents traffic type and statistics.
Reading Netflow Streams
Data is stored as c-type strings on the socket hence struct python module will be used to convert characters into python data type, depending on the size of the field. To make this process simpler, I have created a class with a method that unpacks the buffer based on the size of the field. This method is then called in to the main code whenever needed.
class Unpacker: 'Class to unpack net streams into python integers' def unpackbufferint(self,buff,pointer,size): if size == 1: return struct.unpack('!B', buff[pointer:pointer+size]) if size == 2: return struct.unpack('!H', buff[pointer:pointer+size]) if size == 4: return struct.unpack('!I', buff[pointer:pointer+size]) else: print 'Invalid integer size: %i’%size
unpackbufferint() method has three main parameters; buffer to be unpacked, pointer pointing at the appropriate byte and the size of the field.
As they say a picture is worth thousand words, the diagram below explains the logic behind this method. data ranging from pointer to pointer+size will be unpacked within a given buffer.
Reading a header and Iterating through record counts
First step in reading flow record is to figure out how many records are encapsulated within a payload. This information is stored within the header and extracted using the unpackbufferint() method that we previously defined in our class.
totalrecord = unpck.unpackbufferint(packetbuf,2,2)
Once we know the number of record, the code will iterate through each record and unpack the buffer and store them into the appropriate python variables.
for recordcounter in range(0,totalrecord):
There is a recordpointer variable defined, which alway points at the beginning of the record. This is established using the following equation. where header size is 24 and record size is 48.
recordpointer = 24 + (recordcounter*48)
So following the previous equation, first record would be located at byte 24, after the header. Record#2 is located at byte 72, after the first record. etc
Flow information is extracted using the unpack method that we previously defined in our class. Once we know exactly where each field is located, it’s matter of passing the right parameters and storing the data into appropriate python variables. I’m using these variable to print the content of the netflow packet, but you can use it to do anything else (i.e upload it to the database for further analysis).
dPkts = unpck.unpackbufferint(packetbuf,recordpointer+16,4) print "Total packets: %i" %dPkts dOctets = unpck.unpackbufferint(packetbuf,recordpointer+20,4) print "Total Bytes: %i" %dOctets
For more information on the code check out my GitHub page.