MaxMind GeoIP: Import MaxMind City CSVs into SQL Server 2005

Filed under: Networking, Quick Code, SQL Server — Written by Chrissy on Monday, January 29th, 2007 @ 5:45 pm

Inspired by kitty0's GeoIP Demo, I created two scripts which automatically import the free MaxMind GeoIP database (which resolves IP addresses to city locations) into SQL Server using VBS and BCP. The VBS file can be set as a Scheduled Task for MaxMind's free once-a-month updates. I also created a SQL Server UDF to access the data. Resolving the location of an IP is then as easy as running the following query:

SELECT dbo.IPtoLocation('208.65.15.0') as ipLocation (demo)

UDFs, like stored procedures, are pre-compiled and single results, after the intital compilation/first run, are returned in as little as 0.xx milliseconds. I've seen this script process 3000 hostnames a second on a dual proc Macbook (running Windows 2003/SQL 2005) with no load! A huge thanks goes out to Lee Holmes for helping with that. I was so focused on the indexes that I didn't pay attention to the actual data. Once Lee pointed this out, I was able to create a query which query a max of 3000 distinct recs/second as opposed to 6 recs/sec!

Without further ado, you can download the zip of importMaxmindGeoIP files here. This zip includes the following files:
1. importMaxmindGeoIP.sql
2. importMaxmindGeoIP.vbs
3. GNU unzip.exe (and it's license)
4. README.txt

The SQL file creates a database called maxmindGeoIP. It then creates 3 tables, 3 sample UDFs and 3 stored procedures. The database has been indexed as needed and queries have been optimized a ton.

importMaxMindGeoIP.sql

/*
Created by Chrissy LeMaire (clemaire@gmail.com)
Website: http://netnerds.net/
 
! - Run this prior to running importMaxmindGeoIP.vbs - !
 
This script
1. Creates a database named maxmindGeoIP. IF YOU ALREADY HAVE A DB NAMED MAXMINDGEOIP, THIS SCRIPT WILL DROP IT.
2. Creates a 'bcp prep' stored procedure which creates the necessary tables
    - This stored proc is run each time the importMaxmindGeoIP.vbs file is run
3. Creates another bcp related stored procedure which creates the necessary indexes, defrags the indexes and
   shrinks the database. The database is usually about the same size as the CSV (~55MB-65MB as of Jan07)
    - This stored proc is run each time the importMaxmindGeoIP.vbs file is run
4. Creates a few sample stored procedures and user-defined functions
5. Allows guest accounts to connect to and read the db.
6. The bottom of this file contains example usage such as
    -Select '208.65.15.0' as IP,maxmindGeoIP.dbo.IPtoLocation('204.65.15.0') as ipLocation
7. Remember that SQL UDFs and Stored Procedures are pre-compiled so the results will take a little longer the
   first time the function or procedure is run.
 
NO WARRANTIES, USE THIS AT YOUR OWN RISK, etc.
*/
 
On second thought, please just download the rest from the link above (the SQL file or the zip). It's quite long.

The VBS checks the MaxMind site for some zips, downloads them to %TEMP%, unzips to %TEMP%, creates the necessary BCP format files, imports the data and deletes all unnecessary files. It also calls two procedures which prep the database for an import and then cleans up the data and indexes it. This import takes 2-10 minutes to run, depending on a whole lot of variables. Please read the comments section to see the requirements (such as BCP and the login/pass for BCP).

importMaxmindGeoIP.vbs

'****************************************************************************
' Created by Chrissy LeMaire (clemaire@gmail.com)
' Website: http://netnerds.net/
'
' This script downloads the GeoLite CSV files from Maxmind.com
' and uses bcp to transform it into SQL Server data. Run this once
' on the second day of each month.
'
' ! - Run importMaxmindGeoIP.sql prior to running this script - !
' ! - Follow the instructions in README.txt or this won't work - !
'
' 1. You must download "unzip.exe" either from my server at
'     http://blog.netnerds.net/code/importMaxmindGeoIP.zip
'     http://sourceforge.net/projects/gnuwin32 and save it in your
'     PATH (ie. C:\windows)
'
' 2. You must execute the SQL statements found in the zip
'     http://blog.netnerds.net/code/importMaxmindGeoIP.zip prior to running this
'     script
'
' 3. The SQL Script creates a new database called MaxMindGeoIP
'     as well as some stored procs and UDFs. If you change anything
'     in the SQL script, make sure you update the BCP statements
'     below
'
' 4. Run this script on a machine with BCP and a trusted connection to the
'     SQL Server. If you don't have a trusted connection (a domain), change
'     the BCP parameters below (remove -T and add -u,-p).
'
' 5. Run this script at least once from the console. You may be prompted to
'     answer yes to the question "Do you wish to run this program?"
'     for both the vbs file and unzip.exe.  Uncheck the box
'     "Always ask before opening this file" and click "Run"
'
' I have only tested this on SQL Server 2005 and Windows Server 2003
'
' "What it does"
' 1. Downloads the GeoLite CSVs from Maxmind.com to the %TEMP% directory
' 2. Unzips the files into the same directory
' 3. Creates the necessary BCP format files
' 4. Executes bcp to bulk import the file
' 5. Executes a stored procedure which adds indexes to the tables and shrinks the DB.
' 6. Deletes all downloaded/extracted/created files once it's done with them
'
' NO WARRANTIES, USE THIS AT YOUR OWN RISK, etc.
'*****************************************************************************
 
'Meh, please just download this from the link above.

Once the tables are populated, you can do just about anything you'd like with the information. If you'd like to see it in action, click here for a demo on your IP address. The data is capable of returning longitude and latitude so you can also plug this in with Google Maps data (as seen here).

If you'd like this capability for your SQL Server database, download importMaxmindGeoIP.zip, read the README.txt and have fun. Just as an FYI, if you keep track of the IPs that hit your website, here's some sample code

SELECT ipaddr,maxmindgeoip.dbo.IPtoLocation(ipaddr)
FROM hitcounter WHERE dateadded between DATEADD(hh,-24,GETDATE()) and GETDATE()
GROUP BY ipaddr

The GROUP BY basically makes the ipaddr DISTINCT so this shows you where your visitors have come from for the past 24 hours. See the demo which resolves the locations for my last 25 visitors in real time. I also used the script below to query 3000 records a second.

SELECT dbo.iptolocation('24.0.175.222') as ipLocation -- ensure the function runs at least once to compile
GO
SELECT DISTINCT TOP 3000 ipaddr INTO ipaddrs FROM hitcounter -- populate and index a table with 3000 distinct IPs. of course, you need a table with at least 3000 distinct IPs.
GO
CREATE CLUSTERED INDEX IDX_ipaddrs ON dbo.ipaddrs (ipaddr ASC)
GO
DECLARE @startTime datetime
SET @startTime = getDate()
SELECT dbo.iptolocation(ipaddr) from ipaddrs
PRINT 'Your query took '+CAST(DATEDIFF(ms,@startTime,GETDATE()) AS varchar)+' milliseconds to run.'

I can consistently get 1063 and 1050 millisecond runtimes :-D
UPDATE: The original VBS file was set to download the first zip it saw in "http://www.maxmind.com/download/geoip/database/GeoLiteCity_CSV/". I didn't expect Maxmind to keep it archives there but now the updated script looks for the last zip (and presumably the most updated), not the first.

17 Comments   -
  • Comment by jrp | February 19, 2007 @ 11:17 pm

    Great stuff. Just what I was looking for.

    A couple of questions. What do I do to usp_IPtoLocation to make it non-US/CA-centric (I get less detail if I include non-US/CA ip addresses).

    The vbs script has a couple of issues:

    - it does not download the latest database, but the previous one (ie, Jan rather than Feb)

    - it does not run at all on Server 2003 R2 as permission is denied at the point where the unzip is run (presumably as security protection). How do I enable it, please?

    Thanks again.

  • Comment by Drew Freyling | February 20, 2007 @ 11:27 pm

    Nice work. Very handy script to have. Just a few slight modifications for SQL 2000 and I'm good to go!

  • Comment by Chrissy | February 21, 2007 @ 7:46 pm

    Hey JRP,
    Thanks for letting me know. It seems that they are keeping an archive of their zips. I'll modify the script to take the last zip listed.

    I've run this script on R2.

    Run this script at least once from the console. You should be prompted to answer yes to the question "Do you wish to run this program?" for both the vbs file and unzip.exe. Uncheck the box. "Always ask before opening this file" and click "Run".

    The script itself isn't US-CA centric, the database is. Sometimes it does return a "state" for outer countries but the state is something like "B2." I'm sure that's a reference to something..but it's a something that I currently don't have ;)

  • Comment by Chrissy | February 21, 2007 @ 7:49 pm

    Glad I could help, Drew! What mods did you have to make, btw?

  • Comment by Drew Freyling | February 21, 2007 @ 11:43 pm

    INDEXDEFRAG works differently in 2k because it wouldn't accept the two arguments that you used in the clean up stored proc. I also had to change the select from sys.objects for the prep stored proc to use sysobjects instead.

  • Comment by Drew Freyling | February 22, 2007 @ 4:58 am

    The following should create a proc to just retrieve the country if you don't want the city and region:

    CREATE PROC [dbo].[usp_IPtoCountry] (@strIP varchar(15))

    AS
    BEGIN
    DECLARE @intIPNum bigint
    DECLARE @strLocation nvarchar(255)
    DECLARE @strCountry nvarchar(55)
    IF (LEN(@strIP)-LEN(REPLACE(@strIP,'.','')))/LEN('.') = 3 -- if there 3 periods
    SET @intIPNum = (16777216 * CAST(PARSENAME(@strIP,4) as bigint) + 65536 * PARSENAME(@strIP,3) + 256 * PARSENAME(@strIP,2) + PARSENAME(@strIP,1))

    SET @strLocation = (SELECT TOP 1 country FROM geoipCountry WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC)

    IF @strLocation IS NULL
    BEGIN
    SET @strCountry = (SELECT TOP 1 country FROM GeoIPCountry WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC)
    If @strCountry is NULL
    SET @strLocation = 'Unknown or Unassigned'
    ELSE
    SET @strLocation = @strCountry
    END

    SELECT @strLocation
    END

    GO

  • Comment by Chrissy | February 22, 2007 @ 5:24 am

    Oops..is that what he wanted? That query is gonna fly...

  • Comment by Drew Freyling | February 22, 2007 @ 11:26 pm

    Whoops, thats why I'm not a DBA :)
    Here is the corrected version:

    CREATE PROC [dbo].[usp_IPtoCountry] (@strIP varchar(15))

    AS
    BEGIN
    DECLARE @intIPNum bigint
    DECLARE @strCountry nvarchar(55)
    IF (LEN(@strIP)-LEN(REPLACE(@strIP,'.','')))/LEN('.') = 3 -- if there 3 periods
    SET @intIPNum = (16777216 * CAST(PARSENAME(@strIP,4) as bigint) + 65536 * PARSENAME(@strIP,3) + 256 * PARSENAME(@strIP,2) + PARSENAME(@strIP,1))

    SET @strCountry = (SELECT TOP 1 country FROM [dbo].[GeoIPCountry] WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC)

    If @strCountry is NULL
    SET @strCountry = 'Unknown or Unassigned'

    SELECT @strCountry
    END

    GO

  • Comment by Chrissy | February 22, 2007 @ 11:37 pm

    haha. Thanks for the update. I just removed wp_texturize from the comments, ran your script and it worked great.

    Thanks!

  • Comment by jrp | February 23, 2007 @ 11:04 pm

    I still get "permission denied" problems running the vbs script. Suggestions for debugging?

  • Comment by jrp | February 24, 2007 @ 1:33 pm

    The answer to the permissions problem seems to be to right click on unzip.exe and click Unblock

  • Comment by Drew Freyling | February 28, 2007 @ 2:29 am

    Will setting the transaction isolation level to read uncommitted( thereby removing the locking overhead) make the IPtoLocation proc perform any quicker?

  • Comment by Chrissy | March 1, 2007 @ 10:58 pm

    Hey Drew,
    That seemed very possible so I tried it out by adding the following to the store procedure:

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    Strangely, it actually slowed down the query. The average resolution for 255 IPs without the explicit transaction level was 189ms on average and with the transaction level set, it was 204 ms.

  • Comment by Drew Freyling | March 2, 2007 @ 7:07 am

    I'm guessing the overhead of setting that everytime the proc is run is greater than the performance gain from not row/table locking on each read.

  • Comment by James Hooker | April 25, 2007 @ 1:47 am

    Fantastic, thank you very much! This little package does exactly what it says on the tin! :)

  • Comment by Emil Zegers | November 7, 2007 @ 3:38 am

    I've used your scripts to import the MaxMind data in MSDE (Microsoft SQL Server Desktop Engine 8.00.760).

    To get this done, the following adjustments to the scripts had to be made:

    SQL script:

    - Changed 'sys.databases' and 'sys.objects' to 'sysdatabases' and 'sysobjects'

    - Changed 'WHERE object_id = OBJECT_ID' to 'WHERE id = OBJECT_ID'

    - Added index name to INDEXDEFRAG:

    DBCC INDEXDEFRAG (maxmindgeoip,GeoIPCountry,IDX_countryStarIPtNum)
    DBCC INDEXDEFRAG (maxmindgeoip,GeoIPCountry,IDX_CountryCode)
    DBCC INDEXDEFRAG (maxmindgeoip,GeoIPCityBlocks,IDX_blocksStartIpNum)
    DBCC INDEXDEFRAG (maxmindgeoip,geoIPCityLocation,IDX_LocationID)

    - Changed 'GRANT CONNECT TO [guest]' to 'EXECUTE sp_grantdbaccess [guest]'

    - Added percentage and truncate parameter to SHRINKDATABASE

    DBCC SHRINKDATABASE (maxmindGeoIP,10,NOTRUNCATE)

    Thanks for your scripts!

    Regards,

    Emil

  • Comment by Tim | January 9, 2008 @ 6:48 am

    You can use another technique proposed by IP2Location. It is using format file to define CSV format for command line import.

    http://www.ip2location.com/faqs-ip-country.aspx#9

Leave your comment