[SOLVED] Degraded Performance on HP DL580 G7 on BIOS v. 7/01/2013

After a December 2013 patching downtime, I noticed an abrupt decrease of almost 50% in the performance of 2 SQL Servers.  Both servers were running on HP DL580 G7s.  After many hours of testing and data collection, we finally updated the BIOS firmware from 7/01/2013 to 10/31/2013.

The issue immediately resolved itself.

Throughout the troubleshooting process, I envisioned having the most amazing blog post resulting from the (hopefully) inevitable resolution.  But it turned out to be somewhat anti-climatic.

I would like to say that attempts to engage HP on possible hardware issues was most unhelpful.  Especially now that we know what the issue was, it should have been fairly easy for HP to realize they had a faulty BIOS release for this server model.  Instead, we wasted 2 months trying to figure out what the issue could be.

I hope that if there is anyone else out there with this issue, that this post can be of use.

Posted in SQL Server, Windows

SQL Troubleshooting: Searching the Plan Cache for Missing Indexes and Execution Statistics

I’ve always found basic administration of SQL Server to be fairly straight-forward, but for a long while, I felt very uneasy about troubleshooting SQL Server performance issues.  Any DBA knows that the most productive periods of learning arise when there are problems.  For the past few months, I’ve been trying to track down a problem that I believe is hardware-related.  Instead of keeping all my eggs in one basket, I’ve also been combing through DMV data to optimize the current database code.

One of the first things many people will look for are missing indexes.  There are some great DMVs that will show you what indexes are suggested, but I wanted to know which queries those indexes were meant to help.  I didn’t want to create an index on a table if the query it’s assisting runs infrequently or doesn’t have a direct benefit to a user.  I decided to search the internet for suggestions on how to comb through the Plan Cache for query plans reporting a missing index.

I immediately found a script floating around to do just what I wanted.  I’m not sure who the original author of this code is, but I thank them for helping me on my quest.  The script was able to run through the plan cache and report to me the query text and the tables needing the index.  However, I also wanted to know what stored procedures were responsible.  Having the direct piece of code is helpful to know where to focus, but if it is being called by a stored procedure, it’s not helpful to know which one.  Luckily, adding the required data was accomplished fairly easily by including more columns that were available in the DMVs.

It is important to set your query window to the context of the database you want to look at, so that the Object ID and Name resolutions will work properly.  Any matching queries that are ad-hoc or not tied back to a stored procedure will just show NULL for these values.  I’m also inserting the results into a table in MSDB instead of using a temp table like the original code.  Depending on the size of your plan cache, this may take quite a while to run.  My plan cache is 20GB and takes 30 minutes.  I run it at night, but I have not observed any ill-effect on the database.  In this example, I’ve limited the query plans to those with an execution count > 10.  This removes the vast majority of the plans that are frequently recompiled.  You will likely want to see everything at least once, so you can comment out the WHERE execution_count >10 as needed.

I’m using a second query to join the missing index information back to the query execution information, instead of trying to handle it all at once.  This just allows me to keep it a bit more modular and avoid making the code even more complicated than it is.

At this point, you can query the Missing_Indexes table as you wish, but you will need the following query to join it to the stats DMV.

There are obviously a number of ways you can run this second query, depending on your needs.  I found it useful to remove the SQL_Text column and use a DISTINCT since there are likely to be a number of results.  In my example, I limited the queries to those with an average execution time of > 100ms and also the max execution time of > 100ms.  If dealing with a large amount of information, I would rather focus on the queries that take longer to run, but even fast-running queries can be tuned if they execute a large amount of times.

So by now you have a nice little list of stored procedures, and maybe some ad-hoc queries, that might need some investigation.  Feel free to expand your filtering to exclude the ones with a lower impact or if you feel that the execution counts are not high enough to worry about.  With this new technique, and the use of these helpful DMVs, I am much better suited to finding performance issues quickly.

As an example, a few days ago, a user emailed me with the infamous “hey the system is slow sometimes” email.  Instead of letting them know that their information is less than helpful, I decided to just query the execution stats for the slowest running procedures.  I quickly found the culprit — a stored procedure that averaged 9 seconds to run on a particular screen of the application.  I was then able to see that it had a missing index, which I suggested to the developers.


Posted in SQL Server

SQL Server System Processes Blocking User Processes when Auto Update Stats Async is Enabled.

Recently, we decided to buy a few licenses of Confio Ignite (Now SolarWinds Database Performance Monitor) to monitor some of our high-profile SQL Servers.  I’m definitely not posting a sales pitch, but for me, it’s provided insight into SQL Server that is much more difficult to obtain through traditional methods.  One of the best features of the product is the ability to break activity down by time and to see historical performance information.  Among many metrics, the application displays blocking in the database.

I noticed nearly immediately that the most common blocking was being caused by System processes, not User processes.  Blocking by System processes is a bit harder to nail down.  From my experience, the blocked process report in SQL Profiler doesn’t show this type of issue.  Extended Events have been on my to-do list for a while, but I haven’t gotten comfortable  with them quite yet.

The SPID of the system process listed as the head blocker would typically say TASK MANAGER when pulled up in SSMS.  The wait type was LCK_M_SCH_M.  This is a Schema lock.  This occurs when changes are being made to a table, such as altering an index or other DDL.

So, I started to give it some thought as to what the system could be doing.  Finally I got the idea to look into Auto Update Stats.  That’s when I noticed that my system was enabled for Auto Update Stats Async.  A little research on how this works showed that the system basically outsources the Auto Update Stats task to a system process.  My original understanding was that this would happen in the background, but in reality it seems that the original query that executes is allowed to run, but once the Auto Update event starts, it can and will block subsequent queries.  Not even a SELECT with NOLOCK can break through a Schema lock.

I ran a SQL Profiler trace for only the Auto Update Stats events and grabbed a list of object IDs that were being updated.  These events were firing fairly consistently, with some of them lasting for up to 10 seconds or more in duration.  A simple query will show you the name of the table.

Once I had the list of tables affected by Auto Update Stats events, I checked the row counts of the tables.  I noticed that they all had a few hundred rows or less.

These tables are used for current “Work in Progress” by the application and the contents of the tables are changing rapidly, never exceeding a few hundred rows.  Once the SQL Server determines that the data has changed by 20% or more, it will fire the Auto Update Stats the next time the data is accessed.  In my system, this meant blocking every few minutes for up to 10 seconds.

Since the data in the tables was so limited, I decided to set the tables to NORECOMPUTE.  Setting NORECOMPUTE on individual statistics will prevent them from automatically updating.  To enable NORECOMPUTE, run an update statistics command against each table experiencing the issue:

Immediately after updating the statistics to NORECOMPUTE, all of the system blocking was alleviated.  I’ve been running blocking-free for weeks now with no adverse effects.  If you change your mind and want to remove RECOMPUTE, simply update the statistics again without the NORECOMPUTE option specified.  This also means that if you have automated weekly maintenance in place that will update the statistics on these problem tables, the NORECOMPUTE option will be removed automatically.  To get around this, I simply added another step after my Update Statistics task to manually update my problem table statistics with the NORECOMPUTE option.

Posted in SQL Server

[Solved]: New-WebServiceProxy / GetListItems Excessive Lookup Issue

Recently, while trying to use New-WebServiceProxy to automate some SharePoint form entries, I ran into a lookup issue.

First, I got this rather generic error: Exception calling “GetListItems” with “7″ argument(s): “Exception of type ‘Microsoft.SharePoint.SoapServer.SoapServerException’ was thrown.” .

So I dug deeper with a try/catch.

The resulting error stated: The query cannot be completed because the number of lookup columns it contains exceeds the lookup column threshold enforced by the administrator.

Upon searching Google, many people suggested increasing the threshold in Central Administration, but there were two issues with this

  1. I’m not a Farm Admin (fortunately)

  2. Marc D. Anderson pointed out some very good reasons for not increasing the threshold. The SharePoint farm I’m working with is already slow, so I wouldn’t even want to ask the Admin to up the limit.

Some folks suggested using a different view, other than the default, but even minimal views didn’t work. Ultimately, I limited the ViewFields to just the ones I needed.

This is just a snippet of the fields I used, and fortunately, the total number of fields I needed didn’t have more than 8 lookups. If your script needs to modify more than 8 lookups, then it’s likely that you’ll need to up the lookup limit or divide the Add/Update into two calls.

Posted in PowerShell, SharePoint

Use Windows Authentication to Connect to SQL Server in OpenSuSE 12.3 x64

One of the things I love most about SuSE is how well it integrates with Active Directory. Joining a domain is easy when using yast. First I ensured that the Linux server’s DNS server is pointed to my DC, then yast -> Network Services -> Windows Domain Membership.

Note that if “Windows Domain Membership” does not appear in yast, you will have to
install yast’s samba client module by executing zypper in yast2-samba-client

Once I successfully joined my domain, I downloaded Microsoft ODBC Driver 11 for SQL Server – SUSE and ran the installation as directed. Something I really love about this ODBC driver for Linux is that I found it it also comes with sqlcmd and bcp. SQL Server data import/export from Linux? Whaaat!

Now, when I initially attempted to run isql, I received the following error:

[01000][unixODBC][Driver Manager] Can’t open lib ‘/opt/microsoft/msodbcsql/lib64/libmsodbcsql-11.0.so.2260.0′ : file not found

After confirming that the file existed, I ran ldd to check for required libraries that may be missing.

suse:~ # ldd /usr/lib/libmsodbcsql-11.0.so.2260.0
        linux-vdso.so.1 (0x00007fff20978000)
        libcrypto.so.0.9.8 => not found
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf8fb44000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fcf8f93c000)
        libssl.so.0.9.8 => not found
        libuuid.so.1 => /usr/lib64/libuuid.so.1 (0x00007fcf8f736000)
        libodbcinst.so.1 => /usr/lib64/libodbcinst.so.1 (0x00007fcf8f525000)
        libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00007fcf8f251000)
        libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00007fcf8f011000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fcf8ed0b000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fcf8ea0d000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fcf8e7f6000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf8e5da000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fcf8e22d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fcf9011f000)
        libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00007fcf8e022000)
        libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00007fcf8ddf9000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fcf8dbf5000)
        libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00007fcf8d9ea000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fcf8d7e6000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fcf8d5cf000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf8d3ae000)

Looks like I don’t have the right version of libcrypto and libssl installed. Let’s fix that:
suse:~ # zypper in libopenssl0_9_8

Now that we’ve got all the required libraries, we need to modify /etc/odbc.ini and /etc/odbcinst.ini. Here, I’ll create a DSN entry called mydsn and use it to connect to the test database on sqlservera.base.local.

suse:~ # cat /etc/odbc.ini
Driver = ODBC Driver 11 for SQL Server
Database = test
Server = sqlservera.base.local
Trusted_Connection = yes

And the other

suse:~ # cat /etc/odbcinst.ini
[ODBC Driver 11 for SQL Server]
Description=Microsoft ODBC Driver 11 for SQL Server

So now I login using a Windows Domain account that has privleges to the SQL Server and the test database.

login as: base\chrissy
Using keyboard-interactive authentication.
Have a lot of fun...
BASE\chrissy@suse:~> isql -v mydsn
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |

It works! I’m genuinely surprised that it was so easy. Now to attempt to connect to my AlwaysOn Availability Group. I changed the server from sqlservera.base.local to sqlserver.base.local and attempted to connect:

BASE\chrissy@suse:~> isql -v mydsn
[S1000][unixODBC][Microsoft][ODBC Driver 11 for SQL Server]Cannot generate SSPI context
[S1000][unixODBC][Microsoft][ODBC Driver 11 for SQL Server]SSPI Provider: Server not found in Kerberos database
[ISQL]ERROR: Could not SQLConnect

Well, damn. Maybe I’m missing something? I browse ODBC Driver on Linux Support for High Availability, Disaster Recovery, Welcome to the Microsoft ODBC Driver 11 for SQL Server on Linux and the fabulous guide Securing Access to SQL Server from Linux with Kerberos, and using the information I found, I first try taking Microsoft’s advice and connect sqlcmd using the -E option, which explicitly directs sqlcmd to “Use trusted connection, integrated authentication.”

BASE\sqlserver@suse:~> sqlcmd -D -S sqlserver -E
Sqlcmd: Error: Connection failure. ODBC Driver for SQL Server is not installed correctly. To correct this, run ODBC Driver for SQL Server Setup.

Fail. Next, I try sqlcmd with the -M option to connect using the MultiSubnetFailover option.

sqlcmd: 'M': Unknown Option. Enter '-?' for help.

I then try to add the MultiSubnetFailover to the odbc.ini options and nothing. Maybe it’s my Kerberos config? I enter a whole bunch of stuff using setspn, then confirm everything is fine using Microsoft Kerberos Configuration Manager for SQL Server and still nothing. At this point, I’m thinking it’s a bug.

Posted in Active Directory, Linux, Security, SQL Server

Mounting NAS Volumes in Mac OS X the Right Way

I’ve been using Mac OS X for a while now, but upgraded to a 13-inch Macbook Pro Retina about a year ago.  I love having access to the command line and I just RDP when I need to do anything on a Windows machine at work.  I often find myself needing to access my NAS and networking is one of the only features of OS X to which I have had trouble adapting.  On my Ubuntu laptop, it is effortless to mount a network drive and access it like a local disk or path, but that is not the case on my Mac.  Time and time again, I have to re-establish my connections to the NAS, which is time consuming.

Other articles instruct users to mount the volumes and add those volumes to Login Items for automatic mounting.  I wanted a solution more closely aligned to editing fstab.  After a bit of trial and error and searching the web, I came across this article.

My original idea was to mount my NAS to a folder in /Volumes.  This was problematic due to weird permission issues and automatic folder removal after a reboot.  I finally realized that for better integration into my normal workflow, I would be far better off mounting the NAS to my home directory in /Users/<account>/.  I also found that using AFP (Apple Filing Protocol) gave me better performance than CIFS/SMB.  I am using AFP in my example, but if you want to use another protocol, the steps are very similar.

My NAS is a Seagate GoFlex Home, so I created /Users/brandon/GoFlex as myself.  Then I followed the steps outlined by bmorganthaler on the Apple Discussions website.

From a terminal window:

$ sudo touch /etc/auto_afp
$ sudo chmod 600 /etc/auto_afp

Open the auto_afp file with vi to add the necessary lines.

$ sudo vi /etc/auto_afp

If you are unfamiliar with vi, once it launches, type i to get into insert mode and enter the following, edited to your needs: (local_mountpoint)     -fstype=afp     afp://(username):(password)@(hostname)/(share). For example:

/Users/brandon/GoFlex     -fstype=afp    afp://brandon:[email protected]/brandon

Hit esc to get out of insert mode, type :wq to save and exit. Now we have to configure the system to call the new auto_afp configuration we created.

$ sudo vi /etc/auto_master

Add the following line using the vi techniques outlined above.

/-                      auto_afp        -nosuid

Save & quit, then restart the automount process to mount the new path(s).

$ sudo automount -vc

Verify that your path is mounted an accessible and reboot just to make sure it mounts as expected.

That’s pretty much it!  You now have the mounts you always dreamed of having.  I was able to add some of my media folders to my Favorites bar for quick access, something I was not able to do when I had the mount point located in /Volumes.  I was lucky that my NAS supported AFP, but if you have to use SMB, just replace the AFP names with SMB and I think you should be set — although I haven’t tested it.

Posted in OS X & iDevices

Setup a Fully Functional Lync 2013 Lab Using only One Public IP Address

Note: This article assumes you know how to setup Lync 2013 already and just need to know the intricacies of changing the default ports.

Ever since I was exposed to Lync in 2011, I’ve been a huge proponent of using chat to communicate at work. Years ago, I recall corporate chat being a hard sell, but now it seems generally accepted; many customers who don’t use chat are open to it, they just haven’t have the resources to set it up yet. I’ve setup Google Apps w/Talk for small businesses in the past, but I’ve have been curious implementing Microsoft Lync on-premise so I setup a lab last week to better understand it.

Because my lab is limited on RAM, I attempted to use just one server to accomplish everything I wanted — chat, “PC-to-PC” calls, and video conferencing. I was able to get chat working fine for internal and external users, but video conferencing and calls failed for users who were not on the same internal network.

After learning about the Edge server’s role in providing media services, I gave in and added an Edge server to my topology. I also added a Reverse Proxy because holy smokes, setting up a Reverse Proxy using IIS ARR was way easier than I thought it’d be (and I already had a web server that could do this.)

Here’s the thing about the Edge server: both the Front End and the Edge server use port 443 for very different things. Autodiscovery, which is provided by the Front End/Reverse Proxy, requires HTTPS (though you can use HTTP but I don’t want), and Autodiscovery was a requirement that I gave myself so, I was left having to modify the A/V port for the Edge server. In the setup below, you can see that I changed it to port 442. Modifying the port in the Topology was easy enough, but then I had to figure out exactly what DNS records needed to change.

The DNS change seems obvious now, but I started this with little knowledge about AutoDiscovery and service records. Because of this, I ended up with a ton of unnecessary/incorrect DNS entries.

Here’s how my lab is configured:

Internal SIP domain: ad.local
External SIP domain: acme.com
External IP address
DMZ Subnet: 10.0.10.x
Internal Subnet: 10.0.0.x

Role Hostname Internal IP External IP
Lync Front End lyncfe.ad.local None
Lync Edge lyncedge
Reverse proxy (optional) rp

Port Forwarding
Protocol Port Function IP
TCP 443 Autodiscovery (HTTPS)
TCP 444* Web Conferencing
TCP 5061** MTLS
Both 50000-59999 RTP

* I’m not entirely sure this needs to be exposed, but I opened mine anyway.
** In order to support the widest variety of Lync clients, 5061 appears to
be required for authentication even if you don’t do federation.

External DNS A & CNAME Records
Type Hostname Points to
A lync.acme.com
CNAME sip.acme.com lync.acme.com
CNAME meet.acme.com lync.acme.com
CNAME lyncdiscover.acme.com lync.acme.com

External DNS SRV Record
Service Protocol Port Target
_sip _tls 442 lync.acme.com

Some DNS providers offer an SRV Host field. This is useful if your users authenticate as [email protected] instead of [email protected] In this case, “x” would be your host. Also, I’ve seen different numbers of Priority and Weight. I left mine at 100 and 1, though I’ve seen it at 0 and 0. I don’t think it matters much unless you’re doing load balancing.

Also, after examining client logs, I noticed that some external logins for desktop clients failed when sip.acme.com didn’t exist, so I added it to DNS. Microsoft documentation mentions A records in a whole lot of places, but I found CNAMES work just as well.

A few things to note:

I didn’t hook Lync into Exchange and I didn’t even attempt Enterprise Voice because I don’t have the equipment (or desire.) Authentication was accomplished by adding [email protected] to each AD account email property, then using [email protected] as the sign-in name and ad\username for User Name. Of course, my domain’s Root CA cert had to be installed on all of the client computers, including mobile devices.

I did test to see if Office Web Apps was required for Lync-to-Lync calls (it wasn’t) and blew up one of my server’s IIS configs in the process. OWA doesn’t like to share and deletes all of your IIS Sites during configuration, so don’t install it on a server that does anything else related to IIS.

Also, if you have just one internal subnet and you’re comfortable with the security repercussions, you don’t need a reverse proxy — just a router that can forward from port 443 to port 4443. Also, your edge server will need 2 IPs (no way around that), so you can just give it two on the same subnet and make sure your router NATs to the “External” IP of the Edge server as listed in the Lync Topology.

Posted in Lync

Running Lync 2010 with a SQL Server 2012 Backend

Recently I was tasked with migrating all SQL 2008 R1 and prior databases to SQL 2008 R2 and SQL 2012.  One of my qualifying applications was Lync 2010.  We were under the impression that SQL 2012 would be fine for the Lync 2010 installation.  After migrating the database and updating the connections, Lync would not fully initialize.

After running a quick trace, I discovered that some of the Lync processes were calling sp_dboption.  SQL Server 2012 does not have this stored procedure any longer.  So to remedy the situation, I scripted sp_dboption from the system procedures on SQL 2008 R1 and manually created it on SQL 2012 as a user stored procedure.  After granting execute permissions to the user, Lync 2012 started right up.

We’ve been running on SQL Server 2012 for months without further issue.

Posted in Lync, SQL Server

PowerShell v3: Get Volume Information for Clustered SQL Server Storage

Unfortunately, there are no native PowerShell Failover Cluster cmdlets which will give you the volume information for disks in a cluster. I’ve seen some bloggers use Get-ClusterSharedVolume but a Cluster Shared Volume or CSV is different from volumes used within a SQL cluster.

The script below, which was written with the help of this post on StackOverflow, gets the volume information for disks within specific SQL Server instances on all the clusters in a domain. This does not work with Availability Groups since they don’t use shared storage.

I am writing this code for another project which stores the values in a database, which is why I’m using variables and Write-Host in the example. The output will look like this (well, hopefully yours will have a more advanced storage structure. These particular clustered instances were created for proof-of-concept):

PS C:\Users\administrator> .\Get-ClusteredDiskInfo.ps1
Cluster: CLUSTER2
SQL Resource: SQL Server
Cluster disk: Cluster Disk 1
Disk state: Online
Disk Drive: Y:
Disk Label: iscsivol
Disk Size: 102269
Disk Free: 102111

Cluster: CLUSTER2
SQL Resource: SQL Server (SQL2)
Cluster disk: Cluster Disk 3
Disk state: Online
Disk Drive: F:
Disk Label: iscsivol2
Disk Size: 20476
Disk Free: 20358

Posted in PowerShell, SQL Server

PowerShell & SharePoint: Using New-WebServiceProxy to Post New Completed Tasks Automatically

Recently, I was asked to help automate a friend’s super boring daily duty of logging repetitive activities to a SharePoint Task list. Even though I’m no longer regularly working with SharePoint (yay!), I accepted the challenge.

Below is the script I created, which can be scheduled (and mixed up, even using $a = Get-Random -minimum 15 -maximum 1250; Start-Sleep -s $a within the function) to fill in the form and post new completed Tasks automatically.

Here are a few interesting things while creating this script:

  • Invoke-WebRequest is not the appropriate cmdlet for this task. SharePoint’s form field naming scheme was complex and ultimately, Invoke-WebRequest was unable to successfully submit the task. It seemed like it worked, but the new items never appeared.
  • New-WebServiceProxy works with 2007, 2010 and SharePoint 2013 (as well as PowerShell 2.0 and 3.0) but beware xml’s case sensitivity.
  • Lists.UpdateListItems is used to add new tasks, not just update tasks
  • Fields with spaces or renamed columns could present issues. Makes sure you use the FieldInternalName for the column name. This can be found in the source code of the New page.
  • A people/group picker field only requires the user id of the user in SharePoint. You can get this ID by deciphering the user’s URL or just use ResolvePrincipals as seen in this script. The format should look like this: <Field Name=’AssignedTo’>32</Field>
  • Dates require this format: YYYY-MM-DD
  • The default Description field is actually called the Body field and must be referenced as such.
  • Lookup fields work, but the code below doesn’t automatically resolve the lookup fields. I just went get the ID (by looking at the URL of the item) and manually input it.

Special thanks goes out to the PowerShell team for the jump start.

Posted in PowerShell, SharePoint