MaxMind GeoIP: Import MaxMind City CSVs into SQL Server 2005

Inspired by kitty0’s GeoIP Demo, I created two scripts which automatically import the free MaxMind GeoIP database (which resolves IP addresses to city locations) into SQL Server using VBS and BCP. The VBS file can be set as a Scheduled Task for MaxMind’s free once-a-month updates. I also created a SQL Server UDF to access the data. Resolving the location of an IP is then as easy as running the following query:

SELECT dbo.IPtoLocation(‘208.65.15.0’) as ipLocation (demo)

UDFs, like stored procedures, are pre-compiled and single results, after the intital compilation/first run, are returned in as little as 0.xx milliseconds. I’ve seen this script process 3000 hostnames a second on a dual proc Macbook (running Windows 2003/SQL 2005) with no load! A huge thanks goes out to Lee Holmes for helping with that. I was so focused on the indexes that I didn’t pay attention to the actual data. Once Lee pointed this out, I was able to create a query which query a max of 3000 distinct recs/second as opposed to 6 recs/sec!

Without further ado, you can download the zip of importMaxmindGeoIP files here. This zip includes the following files:
1. importMaxmindGeoIP.sql
2. importMaxmindGeoIP.vbs
3. GNU unzip.exe (and it’s license)
4. README.txt

The SQL file creates a database called maxmindGeoIP. It then creates 3 tables, 3 sample UDFs and 3 stored procedures. The database has been indexed as needed and queries have been optimized a ton.

importMaxMindGeoIP.sql

/*
Created by Chrissy LeMaire ([email protected])
Website: http://netnerds.net/

! - Run this prior to running importMaxmindGeoIP.vbs - !

This script
1. Creates a database named maxmindGeoIP. IF YOU ALREADY HAVE A DB NAMED MAXMINDGEOIP, THIS SCRIPT WILL DROP IT.
2. Creates a 'bcp prep' stored procedure which creates the necessary tables
    - This stored proc is run each time the importMaxmindGeoIP.vbs file is run
3. Creates another bcp related stored procedure which creates the necessary indexes, defrags the indexes and
   shrinks the database. The database is usually about the same size as the CSV (~55MB-65MB as of Jan07)
    - This stored proc is run each time the importMaxmindGeoIP.vbs file is run
4. Creates a few sample stored procedures and user-defined functions
5. Allows guest accounts to connect to and read the db.
6. The bottom of this file contains example usage such as
    -Select '208.65.15.0' as IP,maxmindGeoIP.dbo.IPtoLocation('204.65.15.0') as ipLocation
7. Remember that SQL UDFs and Stored Procedures are pre-compiled so the results will take a little longer the
   first time the function or procedure is run.

NO WARRANTIES, USE THIS AT YOUR OWN RISK, etc.
*/

On second thought, please just download the rest from the link above (the SQL file or the zip). It's quite long.

The VBS checks the MaxMind site for some zips, downloads them to %TEMP%, unzips to %TEMP%, creates the necessary BCP format files, imports the data and deletes all unnecessary files. It also calls two procedures which prep the database for an import and then cleans up the data and indexes it. This import takes 2-10 minutes to run, depending on a whole lot of variables. Please read the comments section to see the requirements (such as BCP and the login/pass for BCP).

importMaxmindGeoIP.vbs

'****************************************************************************
' Created by Chrissy LeMaire ([email protected])
' Website: http://netnerds.net/
'
' This script downloads the GeoLite CSV files from Maxmind.com
' and uses bcp to transform it into SQL Server data. Run this once
' on the second day of each month.
'
' ! - Run importMaxmindGeoIP.sql prior to running this script - !
' ! - Follow the instructions in README.txt or this won't work - !
'
' 1. You must download "unzip.exe" either from my server at
'     http://blog.netnerds.net/code/importMaxmindGeoIP.zip
'     http://sourceforge.net/projects/gnuwin32 and save it in your
'     PATH (ie. C:\windows)
'
' 2. You must execute the SQL statements found in the zip
'     http://blog.netnerds.net/code/importMaxmindGeoIP.zip prior to running this
'     script
'
' 3. The SQL Script creates a new database called MaxMindGeoIP
'     as well as some stored procs and UDFs. If you change anything
'     in the SQL script, make sure you update the BCP statements
'     below
'
' 4. Run this script on a machine with BCP and a trusted connection to the
'     SQL Server. If you don't have a trusted connection (a domain), change
'     the BCP parameters below (remove -T and add -u,-p).
'
' 5. Run this script at least once from the console. You may be prompted to
'     answer yes to the question "Do you wish to run this program?"
'     for both the vbs file and unzip.exe.  Uncheck the box
'     "Always ask before opening this file" and click "Run"
'
' I have only tested this on SQL Server 2005 and Windows Server 2003
'
' "What it does"
' 1. Downloads the GeoLite CSVs from Maxmind.com to the %TEMP% directory
' 2. Unzips the files into the same directory
' 3. Creates the necessary BCP format files
' 4. Executes bcp to bulk import the file
' 5. Executes a stored procedure which adds indexes to the tables and shrinks the DB.
' 6. Deletes all downloaded/extracted/created files once it's done with them
'
' NO WARRANTIES, USE THIS AT YOUR OWN RISK, etc.
'*****************************************************************************

'Meh, please just download this from the link above.

Once the tables are populated, you can do just about anything you’d like with the information. If you’d like to see it in action, click here for a demo on your IP address. The data is capable of returning longitude and latitude so you can also plug this in with Google Maps data (as seen here).

If you’d like this capability for your SQL Server database, download importMaxmindGeoIP.zip, read the README.txt and have fun. Just as an FYI, if you keep track of the IPs that hit your website, here’s some sample code

SELECT ipaddr,maxmindgeoip.dbo.IPtoLocation(ipaddr)
FROM hitcounter WHERE dateadded between DATEADD(hh,-24,GETDATE()) and GETDATE()
GROUP BY ipaddr

The GROUP BY basically makes the ipaddr DISTINCT so this shows you where your visitors have come from for the past 24 hours. See the demo which resolves the locations for my last 25 visitors in real time. I also used the script below to query 3000 records a second.

SELECT dbo.iptolocation('24.0.175.222') as ipLocation -- ensure the function runs at least once to compile
GO
SELECT DISTINCT TOP 3000 ipaddr INTO ipaddrs FROM hitcounter -- populate and index a table with 3000 distinct IPs. of course, you need a table with at least 3000 distinct IPs.
GO
CREATE CLUSTERED INDEX IDX_ipaddrs ON dbo.ipaddrs (ipaddr ASC)
GO
DECLARE @startTime datetime
SET @startTime = getDate()
SELECT dbo.iptolocation(ipaddr) from ipaddrs
PRINT 'Your query took '+CAST(DATEDIFF(ms,@startTime,GETDATE()) AS varchar)+' milliseconds to run.'

I can consistently get 1063 and 1050 millisecond runtimes :-D

UPDATE: The original VBS file was set to download the first zip it saw in “http://www.maxmind.com/download/geoip/database/GeoLiteCity_CSV/”. I didn’t expect Maxmind to keep it archives there but now the updated script looks for the last zip (and presumably the most updated), not the first.

Chrissy is a PowerShell MVP who has worked in IT for nearly 20 years, and currently serves as a Sr. Database Engineer in Belgium. Always an avid scripter, she attended the Monad session at Microsoft’s Professional Developers Conference in Los Angeles back in 2005 and has worked and played with PowerShell ever since. Chrissy is currently pursuing an MS in Systems Engineering at Regis University and helps maintain RealCajunRecipes.com in her spare time. She holds a number of certifications, including those relating to SQL Server, SuSE Linux, SharePoint and network security. She recently became co-lead of the SQL PASS PowerShell Virtual Chapter. You can follow her on Twitter at @cl.

Posted in Networking, SQL Server
28 comments on “MaxMind GeoIP: Import MaxMind City CSVs into SQL Server 2005
  1. jrp says:

    Great stuff. Just what I was looking for.

    A couple of questions. What do I do to usp_IPtoLocation to make it non-US/CA-centric (I get less detail if I include non-US/CA ip addresses).

    The vbs script has a couple of issues:

    – it does not download the latest database, but the previous one (ie, Jan rather than Feb)

    – it does not run at all on Server 2003 R2 as permission is denied at the point where the unzip is run (presumably as security protection). How do I enable it, please?

    Thanks again.

  2. Nice work. Very handy script to have. Just a few slight modifications for SQL 2000 and I’m good to go!

  3. Chrissy says:

    Hey JRP,
    Thanks for letting me know. It seems that they are keeping an archive of their zips. I’ll modify the script to take the last zip listed.

    I’ve run this script on R2.

    Run this script at least once from the console. You should be prompted to answer yes to the question “Do you wish to run this program?” for both the vbs file and unzip.exe. Uncheck the box. “Always ask before opening this file” and click “Run”.

    The script itself isn’t US-CA centric, the database is. Sometimes it does return a “state” for outer countries but the state is something like “B2.” I’m sure that’s a reference to something..but it’s a something that I currently don’t have ;)

  4. Chrissy says:

    Glad I could help, Drew! What mods did you have to make, btw?

  5. INDEXDEFRAG works differently in 2k because it wouldn’t accept the two arguments that you used in the clean up stored proc. I also had to change the select from sys.objects for the prep stored proc to use sysobjects instead.

  6. The following should create a proc to just retrieve the country if you don’t want the city and region:

    CREATE PROC [dbo].[usp_IPtoCountry] (@strIP varchar(15))

    AS
    BEGIN
    DECLARE @intIPNum bigint
    DECLARE @strLocation nvarchar(255)
    DECLARE @strCountry nvarchar(55)
    IF (LEN(@strIP)-LEN(REPLACE(@strIP,’.’,”)))/LEN(‘.’) = 3 — if there 3 periods
    SET @intIPNum = (16777216 * CAST(PARSENAME(@strIP,4) as bigint) + 65536 * PARSENAME(@strIP,3) + 256 * PARSENAME(@strIP,2) + PARSENAME(@strIP,1))

    SET @strLocation = (SELECT TOP 1 country FROM geoipCountry WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC)

    IF @strLocation IS NULL
    BEGIN
    SET @strCountry = (SELECT TOP 1 country FROM GeoIPCountry WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC)
    If @strCountry is NULL
    SET @strLocation = ‘Unknown or Unassigned’
    ELSE
    SET @strLocation = @strCountry
    END

    SELECT @strLocation
    END

    GO

  7. Chrissy says:

    Oops..is that what he wanted? That query is gonna fly…

  8. Whoops, thats why I’m not a DBA :)

    Here is the corrected version:

    CREATE PROC [dbo].[usp_IPtoCountry] (@strIP varchar(15))

    AS
    BEGIN
    DECLARE @intIPNum bigint
    DECLARE @strCountry nvarchar(55)
    IF (LEN(@strIP)-LEN(REPLACE(@strIP,’.’,”)))/LEN(‘.’) = 3 — if there 3 periods
    SET @intIPNum = (16777216 * CAST(PARSENAME(@strIP,4) as bigint) + 65536 * PARSENAME(@strIP,3) + 256 * PARSENAME(@strIP,2) + PARSENAME(@strIP,1))

    SET @strCountry = (SELECT TOP 1 country FROM [dbo].[GeoIPCountry] WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC)

    If @strCountry is NULL
    SET @strCountry = ‘Unknown or Unassigned’

    SELECT @strCountry
    END

    GO

  9. Chrissy says:

    haha. Thanks for the update. I just removed wp_texturize from the comments, ran your script and it worked great.

    Thanks!

  10. jrp says:

    I still get “permission denied” problems running the vbs script. Suggestions for debugging?

  11. jrp says:

    The answer to the permissions problem seems to be to right click on unzip.exe and click Unblock

  12. Will setting the transaction isolation level to read uncommitted( thereby removing the locking overhead) make the IPtoLocation proc perform any quicker?

  13. Chrissy says:

    Hey Drew,
    That seemed very possible so I tried it out by adding the following to the store procedure:

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    Strangely, it actually slowed down the query. The average resolution for 255 IPs without the explicit transaction level was 189ms on average and with the transaction level set, it was 204 ms.

  14. I’m guessing the overhead of setting that everytime the proc is run is greater than the performance gain from not row/table locking on each read.

  15. James Hooker says:

    Fantastic, thank you very much! This little package does exactly what it says on the tin! :)

  16. Emil Zegers says:

    I’ve used your scripts to import the MaxMind data in MSDE (Microsoft SQL Server Desktop Engine 8.00.760).

    To get this done, the following adjustments to the scripts had to be made:

    SQL script:

    – Changed ‘sys.databases’ and ‘sys.objects’ to ‘sysdatabases’ and ‘sysobjects’

    – Changed ‘WHERE object_id = OBJECT_ID’ to ‘WHERE id = OBJECT_ID’

    – Added index name to INDEXDEFRAG:

    DBCC INDEXDEFRAG (maxmindgeoip,GeoIPCountry,IDX_countryStarIPtNum)
    DBCC INDEXDEFRAG (maxmindgeoip,GeoIPCountry,IDX_CountryCode)
    DBCC INDEXDEFRAG (maxmindgeoip,GeoIPCityBlocks,IDX_blocksStartIpNum)
    DBCC INDEXDEFRAG (maxmindgeoip,geoIPCityLocation,IDX_LocationID)

    – Changed ‘GRANT CONNECT TO [guest]’ to ‘EXECUTE sp_grantdbaccess [guest]’

    – Added percentage and truncate parameter to SHRINKDATABASE

    DBCC SHRINKDATABASE (maxmindGeoIP,10,NOTRUNCATE)

    Thanks for your scripts!

    Regards,

    Emil

  17. Tim says:

    You can use another technique proposed by IP2Location. It is using format file to define CSV format for command line import.

    http://www.ip2location.com/faqs-ip-country.aspx#9

  18. Charles says:

    Your lookup logic incorrectly assumes that endIpNum at row X is always one less than startIpNum at row X+1. That is, your code is assuming contiguous IP ranges like this:

    startIpNum endIpNum locId
    67276848 (4.2.144.48) 67277023 (4.2.144.223) 223
    67277024 (4.2.144.224) 67277031 (4.2.144.231) 994
    67277032 (4.2.144.232) 67277039 (4.2.144.239) 223
    67277040 (4.2.144.240) 67277047 (4.2.144.247) 33539
    67277048 (4.2.144.248) 67277055 (4.2.144.255) 4027
    67277056 (4.2.145.0) 67277215 (4.2.145.159) 15886

    But this is not true for all rows. Here is some data directly from the database ordered by startIpNum:

    startIpNum endIpNum locId
    368674304 (21.249.134.0) 385875967 (22.255.255.255) 223
    386665696 (23.12.12.224) 386665727 (23.12.12.255) 58390
    386666240 (23.12.15.0) 386666367 (23.12.15.127) 58393
    393849728 (23.121.171.128) 393849735 (23.121.171.135) 32371
    402653184 (24.0.0.0) 402653695 (24.0.1.255) 118769
    402653696 (24.0.2.0) 402653951 (24.0.2.255) 57771

    Your logic returns a locId of 58393 for all the IP address between 23.12.15.128 and 23.121.171.127, but in actuality those addresses have no entry in the database. That’s 7,183,361 IP addresses that get resolved to Great Britain when in fact the database doesn’t say that.

    Please correct me if I’m wrong.

    Charles

  19. Charles says:

    As solution to the problem I wrote about above, you’re look logic can be done like this:

    SELECT TOP 1
    case when @intIPNum > [Last] then null else locid end
    FROM GeoIPCityBlocks WHERE @intIPNum >= startIpNum ORDER BY startIpNum DESC

    This query will return NULL in the case @intIPNum is not found between any ranges within the database.

    Charles

  20. Jade says:

    Hey there, thanks for the importgeoip script.

    I’ve followed the procedures and instructions on your site and within the readme file.
    When running the vbs file from a command prompt i get the following error :
    C:\automation\importMaxmindGeoIP.vbs(63, 9) msxml3.dll: Access is denied.

    I’ve googled the above error with no positive findings, and have run the script on two different windows 2003 servers, both with the same issue.

    Do you have any suggestions?

  21. Noe says:

    Jade, I was getting the same “importMaxmindGeoIP.vbs(63, 9) msxml3.dll: Access is denied” error message. The solution offered at the following page made the message go away:

    http://blogs.msdn.com/jpsanders/archive/2008/01/15/msxml3-dll-access-is-denied-msxml4-dll-access-is-denied.aspx

    In short, you must add http://www.maxmind.com to your list of trusted sites and enable access to data sources across domains. Windows’ “TrustedInstaller” user has full rights to the msxml3.dll file, and adding maxmind.com to your trusted sites seems to give this user access to the files in maxmind.com through msxml3.dll. I figure that that is what is happening, but it is just a guess.

  22. jeff says:

    I was getting the error Noe had as well, but now when I run the VB code it comes back without any error in 2 seconds with no data as the result. Any thoughts?

  23. Jay says:

    Hi Chrissy,
    Not sure you are montoring this blog, but I am not able to use the VBS. As Jeff mentioned the VBS run in a second and I don’t see any data. COuld you please check??

  24. Jay says:

    Hi Chrissy,
    Not sure you are montoring this blog, but I am not able to use the VBS. As Jeff mentioned the VBS run in a second and I don’t see any data. COuld you please check??

  25. Bekim says:

    Hey guys,
    First off thank you for this script, I have been in a never ending search for a free solution to geolocation and this has come in very handy. Like jeff and Jay above me, I was having the problem of the script failing very quickly with no error message. With a little debugging I was able to discover the problem. The line:

    “intStartPosition = instr(strParseThis,”HREF=””GeoLiteCity_”)+6″

    was erroring due to the face that the HTML source of http://www.maxmind.com/download/geoip/database/Ge… doesn’t seem to be the same as when you created the script. Simply, the HREF is now href :P. Changing the line to

    “intStartPosition = instr(strParseThis,”href=””GeoLiteCity_”)+6″

    will fix the problem. To other people who may be having problems with this script, I did have to add the site http://www.maxmind.com as a trusted site and turn down my security settings as mentioned above. I also had to change the connection string to my SQL Server.

    Hope this helps someone!

  26. Use IMPORT
    EXEC MaxMind_Import
    @GeoLiteCity_Blocks = ‘C:\inetpub\wwwroot\OmegaLove\OmegaLove.Web\VbScripts\GeoLiteCity_20090601\GeoLiteCity_Blocks.csv’,
    @GeoLiteCity_Location= ‘C:\inetpub\wwwroot\OmegaLove\OmegaLove.Web\VbScripts\GeoLiteCity_20090601\GeoLiteCity_Location.csv’

    Msg 4866, Level 16, State 8, Line 1
    The bulk load failed. The column is too long in the data file for row 1, column 4. Verify that the field terminator and row terminator are specified correctly.
    Msg 7301, Level 16, State 2, Line 1
    Cannot obtain the required interface (“IID_IColumnsInfo”) from OLE DB provider “BULK” for linked server “(null)”

    please email me if u gave a solution [email protected]

  27. Doc says:

    Hi Chrissy,

    I know this thread is old, but it is possible for you to make the sql and import scripts available for download please. I am trying to implement a solution with the geoip database and just wanted to know if you could make it available for download. Thank you.

  28. Humill Bca says:

    can you please upload me the files ? it seems the download links are broken now.. would be of great help

2 Pings/Trackbacks for "MaxMind GeoIP: Import MaxMind City CSVs into SQL Server 2005"
  1. […] database. I figured the database could use some updating so I ran the ImportMaxMindGeoIP.vbs script I posted awhile back, and much like in real life, I received an “Access Denied” […]

  2. […] MaxMind GeoIP: Import MaxMind City CSVs into SQL Server 2005. Share this:Like this:LikeBe the first to like this post. This entry was posted in Uncategorized. Bookmark the permalink. ← My Stats […]

Leave a Reply

Your email address will not be published. Required fields are marked *

*