MaxMind GeoIP: Import MaxMind City CSVs into SQL Server 2005
Inspired by kitty0's GeoIP Demo, I created two scripts which automatically import the free MaxMind GeoIP database (which resolves IP addresses to city locations) into SQL Server using VBS and BCP. The VBS file can be set as a Scheduled Task for MaxMind's free once-a-month updates. I also created a SQL Server UDF to access the data. Resolving the location of an IP is then as easy as running the following query:
SELECT dbo.IPtoLocation('208.65.15.0') as ipLocation
UDFs, like stored procedures, are pre-compiled and single results, after the intital compilation/first run, are returned in as little as 0.xx milliseconds. I've seen this script process 3000 hostnames a second on a dual proc Macbook (running Windows 2003/SQL 2005) with no load! A huge thanks goes out to Lee Holmes for helping with that. I was so focused on the indexes that I didn't pay attention to the actual data. Once Lee pointed this out, I was able to create a query which query a max of 3000 distinct recs/second as opposed to 6 recs/sec!
| This used to be a place with some links but all of it is so outdated, nearly 20 years old, that they've disappeared and have been removed
The SQL file creates a database called maxmindGeoIP. It then creates 3 tables, 3 sample UDFs and 3 stored procedures. The database has been indexed as needed and queries have been optimized a ton.
! - Run this prior to running importMaxmindGeoIP.vbs - !
This script
- Creates a database named maxmindGeoIP. IF YOU ALREADY HAVE A DB NAMED MAXMINDGEOIP, THIS SCRIPT WILL DROP IT.
- Creates a 'bcp prep' stored procedure which creates the necessary tables - This stored proc is run each time the importMaxmindGeoIP.vbs file is run
- Creates another bcp related stored procedure which creates the necessary indexes, defrags the indexes and shrinks the database. The database is usually about the same size as the CSV (~55MB-65MB as of Jan07) - This stored proc is run each time the importMaxmindGeoIP.vbs file is run
- Creates a few sample stored procedures and user-defined functions 5. Allows guest accounts to connect to and read the db.
- The bottom of this file contains example usage such as -Select '208.65.15.0' as IP,maxmindGeoIP.dbo.IPtoLocation('204.65.15.0') as ipLocation
- Remember that SQL UDFs and Stored Procedures are pre-compiled so the results will take a little longer the first time the function or procedure is run.
On second thought, please just download the rest from the link above (the SQL file or the zip). It's quite long.
The VBS checks the MaxMind site for some zips, downloads them to %TEMP%, unzips to %TEMP%, creates the necessary BCP format files, imports the data and deletes all unnecessary files. It also calls two procedures which prep the database for an import and then cleans up the data and indexes it. This import takes 2-10 minutes to run, depending on a whole lot of variables. Please read the comments section to see the requirements (such as BCP and the login/pass for BCP).
Once the tables are populated, you can do just about anything you'd like with the information.
If you'd like this capability for your SQL Server database, download importMaxmindGeoIP.zip, read the README.txt and have fun. Just as an FYI, if you keep track of the IPs that hit your website, here's some sample code
1SELECT ipaddr,maxmindgeoip.dbo.IPtoLocation(ipaddr) FROM hitcounter WHERE dateadded between DATEADD(hh,-24,GETDATE()) and GETDATE() GROUP BY ipaddr
The GROUP BY basically makes the ipaddr DISTINCT so this shows you where your visitors have come from for the past 24 hours. The demo resolves the locations for my last 25 visitors in real time. I also used the script below to query 3000 records a second.
1SELECT dbo.iptolocation('24.0.175.222') as ipLocation -- ensure the function runs at least once to compile
2GO
3SELECT DISTINCT TOP 3000 ipaddr INTO ipaddrs FROM hitcounter -- populate and index a table with 3000 distinct IPs. of course, you need a table with at least 3000 distinct IPs.
4GO
5CREATE CLUSTERED INDEX IDX\_ipaddrs ON dbo.ipaddrs (ipaddr ASC)
6GO
7DECLARE @startTime datetime SET @startTime = getDate()
8SELECT dbo.iptolocation(ipaddr) from ipaddrs PRINT 'Your query took '+CAST(DATEDIFF(ms,@startTime,GETDATE()) AS varchar)+' milliseconds to run.'
I can consistently get 1063 and 1050 millisecond runtimes :-D
UPDATE: The original VBS file was set to download the first zip it saw in "https://www.maxmind.com/download/geoip/database/GeoLiteCity_CSV/". I didn't expect Maxmind to keep it archives there but now the updated script looks for the last zip (and presumably the most updated), not the first.