How are orphan gsrvr.exe processes created?

How are orphan gsrvr.exe processes created?

I just inherited a server running SDE application server 10.1 against an Oracle 11g database. I'm finding that many connections are not getting properly dropped and stay on as Orphans. Eventually the number or Oracle connections maxes out and new users can't connect.

sdemon -o info -I users

will show only a handful of valid connections but windows task manager will show many hundreds of gsrvr.exe processes and Oracles v$session table will also show many hundreds of connections, most with a status of "killed". The only way to actually remove these sessions is to kill the corresponding gsrvr.exe process in task manager. This clears out the v$session table and allows new users to connect.

So my question is how are these orphans generated and how do I prevent it from happening. Also is there and automated way to kill then instead of manually going through windows task manager?

Esri has two mechanisms for making database connections embedded in the ArcSDE 'C' API. The first (original) protocol uses an application server (giomgr) process, usually running on the database server, to accept network connections and then bequeaths the connection to a child (gsrvr) process to manage database interaction on behalf of the client. While efficient in terms of network load, the application server paradigm has its faults:

  • It increases CPU and RAM utilization on the database server
  • It increases licensing cost if not run on the ArcGIS Server host (and CPU load when run on the AGS server)
  • It's subject to transient events that cause the service process to hang, creating orphan sessions

ArcSDE has always had the ability to kill running sessions (usingsdemon -o kill), and has the option of enabling TCPKEEPALIVE (which is a misleading name, since it's purpose is to locate network sessions which have died, mostly by increasing traffic [making it "chatty], so that quiet sessions can be killed), but orphan sessions can still accumulate over time. Restarting the application server clears out accumulated sessions, but will also kill all active sessions, so it's not available in all environments.

The newer (and now default) Direct Connect protocol uses the same exact code as the gsrvr, but bundles it as a DLL, to be run as a separate thread in the client application, shifting the database load to the clients (which are much more powerful than when Direct Connect was originally released). The drawbacks to using Direct Connect are:

  • Administrative privileges are needed for the 'SDE' user to terminate Direct Connect sessions
  • Older releases were not as flexible in heterogeneous geodatabase operation (binary client not matching the geodatabase instance)
  • The clients must each have a viable database client install

Most database clients are relatively small, but until recently, the required Oracle Client needed at least 500Mb. Fortunately, that is no longer the case -- The "Oracle Instant Client" install is bundled within recent ArcGIS Desktop and Server installs, and weighs in closer to 60Mb, with a deployed size under 150Mb, and it doesn't require "Setup.exe" install, so the client doesn't need administrative rights for deployment on Windows operating systems.

Much like the "Doctor, Doctor" joke ("Doctor, doctor! It hurts when I do this"), the solution to application server hangups is, "Don't do that." The situations which create orphan application service processes are not easily solved, but using Direct Connect makes the issue moot (albeit with a more complicated "kill" solution for otherwise operational clients).

Even though you're using an older ArcGIS release, you should also be aware that application server use was deprecated at ArcGIS 10.2, and is not available at ArcGIS 10.3 (though application server client connections are still possible to pre-10.3 geodatabases), so moving in the direction Esri has been advocating for the last five major releases is probably a good move.