Monday 27 July 2009

SPFun

Or what I learned about Service Packs.

A SQL Maintenance Plan job had been running fine on my SQL 2005 RTM machine for months: until I joined the company, and made the mistake of having a look at it and doing some tidying. Next thing I know?

"Unable to start execution of step 1 (reason: line(1): Syntax error). The step failed.".

"But, but, but, but!" I cried "I've just stopped it faffing about with Adventureworks, and removed that database from the server. I haven't touched the syntax directly, I dealt with this in SQL Management Studio. The syntax, surely, should be fine?"

I shrugged, recreated the job... and the same thing happened the next time it ran.

Hmmm.

It would seem that this particular error happens when SP2 is not yet installed on one's SQL box, and one edits a job via a newer version of SSMS. If one plays about with the job on the box, the problem is solved. Or, if one does the sensible thing, and installs the service packs.

I foresee some out of hours work for us here.

Friday 17 July 2009

SQL Integrity Checks

The job failed "Integrity Checks Job for DB Maintenance Plan 'DB Maintenance Plan1' sqlmaint.exe failed. [SQLSTATE 42000] (Error 22029). The step failed." That's the job history. The history against the Maintenance plan (I only learned about this more detailed history last week, and I think it's fab), said "[Microsoft][ODBC SQL Server Driver][SQL Server]Repair statement not processed. Database needs to be in single user mode."

There are all sorts of ways of dealing with this.

Firstly, you can untick the box that says "Attempt to repair any minor problems" within the database maintenance plan. This is the same as changing the end of the SQL statement in the scheduled job to read "-CkDB" instead of "-CkDBRepair" If the job fails, this will flag up that there are errors within the database. Run DBCC CHECKDB at some point when you get in, and investigate from there. Databases need to be in single user mode to repair themselves. Sometimes, this is not possible (for example, SQL boxes that support websites with 24/7 availability) and you need scheduled downtime.

Secondly, you can ensure that users are not logged in. In one case, I shifted the job to run at 23:45 instead of at 19:30 - there were a couple of extremely enthusiastic users who were still logged in at 19:30.

How did I find out who they were? Script... On all servers, I find that an admindb is very helpful. So, I created a new table:


USE [admindb]
GO
/****** Object: Table [dbo].[tbl_sweetsql_usersloggedin] Script Date: 07/17/2009 10:15:11 ****** (C) Sweet SQL Lass/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tbl_sweetsql_usersloggedin](
[loginame] [varchar](50) NOT NULL,
[logintime] [datetime] NULL,
[currenttime] [datetime] NULL,
[hostname] [varchar](50) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO

and then ran the following as a scheduled job:

USE [master]
GO
/****** Script Date: 07/17/2009 10:15:11 ****** (C) Sweet SQL Lass/
INSERT INTO
admindb..tbl_sweetsql_usersloggedin
SELECT loginame,
login_time,
getdate(),
hostname
FROM
sysprocesses
WHERE dbid = 7

(use
SELECT *
FROM sysdatabases
to find out what the dbid is).

The above code works on SQL 2000. For SQL 2005/8, use sys.sysprocesses and sys.databases views to get the same information. This is published under a creative commons license. (c) Sweet SQL Lass.

If the users are really lazy, and just didn't log out, and you want to repair things in the middle of the night, then you may need to use some SQL code to deal with this and throw them out of the database. There is plenty out there (and I have a version a colleague produced, and am seeking permission to publish, because it is rather nifty). Personally, I wouldn't recommend this. At the very least, warn your users that they will get logged out.

Heck. Microsoft doesn't recommend automatically repairing your database as part of an Integrity Check job. So who am I to go against that idea?

Finally, don't forget - log the information for your Maintenance Jobs to a Report File. It makes life much easier if you've done something really daft, like drop a database from the server without bothering to remove it from the Maintenance Job first.

Tuesday 14 July 2009

Fun and Games with SCOM

Trying to find out what the recommended database size for Operations Manager 2007 should be is a headache....

Picture this. The boss mentions that you might just want to check the SCOM server, because, based on the situation on the MOM server, it might just be misconfigured. 41,240 events worth of MSSQLSERVER errors in the Application Event Log misconfigured? Owing to lack of space in the OperationsManagerDB? Yes. At this point, all you can do is laugh. Lots. It was quite the sight.

A trawl round the internet suggests this spreadsheet may help work out how big this database needs to be - it's Operations Manager 2007. Like SharePoint, and MOM, the database is not supposed to be set to autogrow, and it needs elbow room (about 40% freespace on the data file - see the comments) for indexing and suchlike. An excellent article can be found here about estimating database sizes, and there is also this article and an explanation of how to turn it into a spreadsheete here.

It helps if you know how many agents you're monitoring. We think 40 per server - and we get a result in the spreadsheet that's about the same size as the current data file, which has zero free space. So, I doubled the size of the data file. This gave me 49% free space. Not bad for a guess! I begin to wonder if SQL Enterprise Edition has different stats from SQL Standard Edition for this as well - we're running SQL Standard Edition, and it always seems to want More Space than Enterprise Edition.

Miraculously, when the database has grown, all errors stop, all of a sudden, we have indexes and, most importantly, we can monitor again.

Playing with the spreadsheet suggests that the Data Warehouse data file needs to be about 20 times the size of the Operations Manager data file. But it's OK: you can set this to autogrow... There will only be a problem when we're out of disk space on the server - I've got 120GB to play with for the datafiles for all the databases on the server.

Tempdb also needs to be about 20% of the size of the two other databases combined. Careful though: if you set it to autogrow, this mucks up Operation Manager's ability to monitor it out of the box. If you don't set to autogrow, then you get lots of emails about it... because it's one of the system databases... Shame. Like Thomas says: this is the sort of thing we might just like to know!

Begs the question, where did the email that told someone that the Operations Manager database had run out of space actually go? That I must find out!