Thursday, November 05, 2009

SnTT: tracking down space hogs in Notes

You all have them. People who keep every stinking e-mail they ever received for the past 10 years. They can't possibly delete that e-mail response that simply says "thanks" because it would destroy their CYA audit trail. Your storage budget looks like the national deficit so finally management is asking why you need that much space. You want to give them a clean report showing who is using their mail as a vast garbage dump.

Creating a Notes agent to loop through the mail directory, open each database, and extract the space used and percent free is trivial. But on your server with 1000+ mail databases and dozens or hundreds that are over a gig, it takes a while to run and bogs down the server. So what can you do?

Enter the Lotus C API. Notes doesn't use the same entry point you do when it's working with databases. Notes uses the spiffy C API, which runs at a lower level. LotusScript code has to be interpreted, so there is some lag. And once it's interpreted it ends up calling the C API. You can bypass the middle man and hit the C API directly.

In this case you only need three C API calls: NSFDbOpen, NSFDbClose and NSFDbSpaceUsed. Add in a NotesDBDirectory and a loop and you're good to go. The following will create documents and put them into a view that has the first column set to "# in view". The second column is the dbsize. It then walks the view to write the position in the view to each record.

(Declarations)
Declare Function NSFDbOpen Lib "nnotes.dll" Alias "NSFDbOpen" (Byval dbName As String, hDb As Long) As Integer
Declare Function NSFDbClose Lib "nnotes.dll" Alias "NSFDbClose" (hDb As Long) As Integer
Declare Function NSFDbSpaceUsage Lib "nnotes.dll" Alias "NSFDbSpaceUsage" (ByVal hDB As Long, retAllocatedBytes As Long, retFreeBytes As Long) As Integer

Sub Initialize
Dim s As New NotesSession
Dim rdoc As NotesDocument
Dim mfile As String
Dim success As Variant
Dim pmail As String
Dim dbdir As New NotesDbDirectory("server/domain")
Dim db As NotesDatabase
Dim thisDb As NotesDatabase
Dim view As NotesView
Dim nvec As NotesViewEntryCollection
Dim eOne As NotesViewEntry
Dim eTWo As NotesViewEntry
Dim dbHandle As Long
Dim usedBytes As Long
Dim freeBytes As Long

'Using NotesDBDirectory gives us a handle to the database
'and limited information about it. The rest of the
'information, such as PercentUsed, won't be populated until
'db.Open is called, which we don't want to do because
'that's what drags the server down. Instead we'll combine
'information from the closed database and some Notes C API
'calls to get the specific information we want.

Set db = dbdir.GetFirstDatabase(DATABASE)

While Not db Is Nothing
mfile = db.FilePath
'Only get databases in the mail subdirectory
If Left$(mfile, 4) = "mail" Then
'Get a handle to the database. The NotesDatabase object
'has a LotusScript handle, we need a C API handle.

Call NSFDbOpen ("server/domain!!" + mfile, dbHandle)
If dbHandle <> 0 Then
'Peek inside and get the used bytes and free bytes
Call NSFDbSpaceUsage(dbHandle, usedBytes, freeBytes)
'We have what we need so close the C API handle to prevent a memory leak
Call NSFDbClose(dbHandle)
End If

Set rdoc = New NotesDocument(s.CurrentDatabase)
rdoc.form = "EmailRecord"
rdoc.dbFilename = FilePath
rdoc.title = db.title
rdoc.mailfile = db.FilePath
rdoc.dbsize = db.size
rdoc.pctused = Round((usedBytes / db.size, 2) * 100
rdoc.server = db.Server
rdoc.Username = db.Title
Call rdoc.Save(True,False)
mfile = ""
End If

Set db = dbdir.GetNextDatabase
Wend

' Next we walk the all docs view and write the user's ranking to their document
Set thisDB = s.CurrentDatabase
Set view = thisDb.GetView("AllDocs")

Set nvec = view.AllEntries
Set eOne = nvec.GetFirstEntry
Do Until eOne Is Nothing
Set doc = eOne.Document
Print "On doc " + Cstr( eOne.GetPosition("."))
doc.Ranking = Cstr( eOne.GetPosition("."))
Call doc.Save(True, False)
Set eTwo = eOne
Set eOne = nvec.GetNextEntry( eTwo )
Loop
End Sub

This LotusScript was converted to HTML using the ls2html routine,
provided by Julian Robichaux at nsftools.com.


[UPDATED 11/5/2009 9:25 AM to include the NSFDbClose code to prevent a memory leak.]

15 comments:

  1. Hi Charles, This is a great example of using the API to make a simple task run even faster! However, you need one more API function: NSFDbClose(). Without this you are leaking a database handle and on a large server will eventually start throwing 'out of handles' type errors.

    Also, wouldn't it be easier to just loop through the documents in the catalog?

    ReplyDelete
  2. Craig, you're right. I have been working on this post for a while and didn't notice I didn't post the final version of the code. I'll update that shortly.

    On this server the catalog is gigantic. Opening that was a significant hit itself and a NotesDBDirectory was a much faster option.

    ReplyDelete
  3. Oh, I just remembered the Catalog just has documents describing the databases. You would still have to open them, which is what I was trying to avoid. NotesDBDirectory will give you a not-full-populated database object already.

    By using the NotesDatabase from NotesDBDirectory in conjunction with the database handle from NSFDbOpen you can sort of dance around actually opening the database, which is why this runs so quickly.

    ReplyDelete
  4. The catalog will also have the information about database size (DbSize), free space(DbPercentUsed), number of documents (DbNumDocuments), etc. Seems to me that the cost of opening the catalog and walking the view reading the summary info would be a lot faster than opening every database individually. NSFDbOpen will still take a hit, performance wise, if it encountered a large database, not not as severe as the LS version.

    ReplyDelete
  5. I completely forgot that was in the catalog because I've enver had any luck getting useful information out of it. The numbers were never accurate when I looked at them. Maybe it has improved over the years.

    ReplyDelete
  6. There definitely could be a problem with accuracy, do doubt. There are too many shops that have turned off the catalog for one reason or another and end up missing out on an extremely valuable resource. The other common misunderstanding is surrounding the 'Do not list in catalog' database property. These databases are still in the catalog, the documents just get excluded from the view via the selection formula. A script like yours would also need to take that into account and use 'AllDocuments' or something similar. A FT search could also be used and just report dbs over a threashold. Heck you could even just create the view in the catalog itself and not worry about the agent... but I'm getting carried away.
    Just to backtrack a minute, any script should double check the last updated date for the database and the individual documents and post warnings when they are out of date, or just a statement on the document 'Valid as of this date'.

    All that being said, I still stand by my original comment that this is a good example of using the API to do something that otherwise would take longer than it needs to. The more tools you have in your tool box the better off you are.

    ReplyDelete
  7. Charles, it surprises me that there would be much performance improvement. If I were to guess I'd have said that IO performance would swamp the LS interpretation overhead. Do you have any comparison figures on running the C api version versus a pure LS version?

    ReplyDelete
  8. @kerr - The performance difference isn't really to do with the 'interpreted' aspect of LS. It has everything to do with the way the LS classes are implemented. When you open a document for example in LS, the LS class needs to load a lot more information in order to get the item list, parent database, etc members of the NotesDocument object setup. When you open a document via the API, none of this happens and you only get a handle to the document. If you need the rest of the info the LS class provides, you make other API calls to set that up. Because of this, in my experience, an API program can run circles around a LS version that does the exact same thing.

    ReplyDelete
  9. @craig, Interesting. I thought the LS classes where pretty lightweight wrappers that delegated most of the work down to the capi as required. Certainly if they are doing a huge chunk of extra work then that would explain it. Still, I'd love to see soem numbers. I might have to try a run some benchmarks.

    ReplyDelete
  10. Presumably, this approach ties you, hand and foot, to Windows? (You have declared DLLs here).

    Is it even possible to use the C API on other platforms?

    ReplyDelete
  11. @Mike - this example is WIN32 only, though you can easily change the Declares to support what ever platform you need. For example, AIX would be libnotes.a, LINUX/Solaris would be libnotes.so, etc.

    ReplyDelete
  12. Another tip:

    If we expect that the database size may grow beyond 4GB (quite likely when dealing with space hogs,) a better function to use would be NSFDbSpaceUsageScaled( ByVal hDB As Long, retAllocatedBytes As Long, retFreeBytes As Long, retGranularity as Long ).

    The return results (retAllocatedBytes and retFreeBytes) are multiplied by the size of the chunks (retGranularity) to get the actual values.

    But, I haven't tested any of this and could be way off base.

    ReplyDelete
  13. Wow, I leave for vacation and the discussion continues! Thanks for keeping up, Craig. :-)

    @Kerr - This was something I helped a friend come up with because opening every database was taking far too long. He was killing it after an hour. This technique completes in under 10 minutes.

    @Grant - You are correct about this only working with databases up to 2GB. You can use Currency instead of Long to get precision up to 838 terabytes. I read about NSFDbSpaceUsageScaled but it seemed like a lot of indirection to me for no real gain.

    ReplyDelete
  14. Charles, Those are certainly impressive numbers. I'd love to know what was going on under the hood that would make that much difference. Obviously the LS is wasting a huge amount of time somewhere. Just one of those corner cases I suppose.

    ReplyDelete
  15. Kerr - We never did any performance monitoring or methodical statistics gathering so I can only make some assumptions and guesses. I think that the NotesDatabase.Open call forces a behind-the-scenes load of the index for the default view or folder. In the case of a mail database that is the ($Inbox) folder, and when the user has 10,000 messages in his Inbox that index is very large.

    Calling NSFDBOpen instead only creates a handle but doesn't force the default view to load.

    ReplyDelete