Sometimes you'll have problems with your AOLserver and it can be difficult to track down exactly what's going wrong. Here's some common problems as well as techniques for tracking down the problem.
Crashes or spontaneous restarts when using Oracle
Make sure theStackSize
parameter in[ns/parameters]
is set to something large (like100000
)This is really easy to miss! Check this out before burning a lot of time doing other things.
Crashes with POSTed data
AOLserver 2.3.2 will crash if a POST is done for an URL that doesn't have a registered handler.Crashes with CGI
Sometimes AOLserver will crash if a CGI request ends with a trailing slash.
Database-related problems
- If your Oracle driver is version 1.0 or later set
debug=on
in[ns/db/driver/drivername]
. This will show a trace of most OCI calls being made
- We're not sure if there's an impedence mismatch between different versions of Oracle and the Oracle driver (e.g. the driver was compiled at linked against 8.0.4 but used with 8.1). If you're having random database problems try compiling and linking the driver against the Oracle version you're using.
General
- Try to narrow the scope of the problem. It'll help if one URL or a particular sequence of actions can reproduce the problem.
- If the server is hung, do a
kill -3
to drop a core file.
- Look in the sitename-error.log for anything interesting happening around the time of the crash. Search for 'starting' to easily find server restarts.
- Try commenting out and any C-based modules are used that didn't come with AOLserver to see if it's an AOLserver problem or one caused by module
- Grab MarkD (markd@arsdigita.com, AIM Handle is BorkWare) to take a look. If he's not available then forge ahead.
- Set
CatchExceptions=on
in[ns/parameters]
. This will prevent bad signals (like SIGSEGV and SIGBUS) from being caught and allow a core file to be dropped for later investigation.
- Set
Debug=on
in[ns/parameters]
and[ns/server/server-name]
. This will spew lots of stuff to the sitename-error.log. There may be interesting information before server restart lines
- Set
Verbose=on
in[ns/parameters]
and[ns/server/server-name]
(in addition to Debug being set) This will spew lots more stuff to the sitename-error.log. There may be interesting information before server restart lines
- (this is an undocumented feature in AOLserver 2.3.3) Connect to the Ad-Hoc Tcl scripting window (or execute in a tcl page)
ns_modlogcontrol set_threshold conn dev
This will spew Yet Even More Stuff to the sitename-error.logRunning AOLserver in a debugger
This works best if you're using AOLserver 2.3.3.
- Acquire a debugger for your platform (gdb for Solaris, gdb/Wildebeest for the HP)
- make a copy of the naughty site's nsd.ini and modify it to be able to run without being root. (e.g. set port numbers higher than 1024, point the log files to someplace you can write). I don't like doing a lot of work like this as Root.
- run the debugger:
% gdb bin/nsd
- run AOLserver in the special 'child process only' mode
(gdb) run -N -kfc ./site-name.ini
- Exercise the server to reproduce the problem. You hopefully will get a nice "Segmentation Fault" message. Get a backtrace by typing
where
, and hopefully the problem will be obvious.
Solaris
Solaris has a cool command called "truss", which will show all system calls being made by a program. If you get an undecypherable message from AOLserver, sometimes looking at the system calls and their return codes can point to the problem.To run with truss, AOLserver can't work normally and create the child process which does the actual page serving. You'll need to run with the
-N
'child process only' mode:
truss bin/nsd -N -kfc ./site-name.ini