Automating CPU Load Monitoring for Oracle EBS on Solaris
Automating CPU
Load Monitoring for Oracle EBS on Solaris
The Challenge:
High CPU Load in Oracle EBS
The Solution: A
Robust Bash Script
Addressing the
“Failed to Generate prstat Report” Error
Automating CPU Load Monitoring for Oracle EBS on
Solaris
As an Oracle E-Business Suite (EBS) DBA, managing high database server
load is a critical task to ensure system performance and stability in a 24x7
environment. High CPU usage can stem from resource-intensive Oracle processes,
concurrent managers, or user sessions, and identifying the culprits is key to
optimizing performance. In this post, we’ll explore a Bash script that
automates the monitoring of top CPU-consuming processes on a Solaris server,
maps their process IDs (PIDs) to Oracle session IDs (SIDs), retrieves session
details and SQL text, and sends the results in a professional HTML-formatted
email. This solution is designed for Oracle EBS DBAs to analyze load patterns
and troubleshoot performance issues efficiently.
The Challenge: High CPU Load in Oracle EBS
In an Oracle EBS environment, high CPU load can lead to slow response
times, timeouts, or degraded user experience. To address this, DBAs need to:
·
Identify the top CPU-consuming processes
during load spikes.
·
Map OS process IDs (PIDs) to Oracle session
IDs (SIDs) to pinpoint database sessions.
·
Retrieve session details (e.g., username,
program, module) and SQL text to investigate the root cause.
·
Automate this process to capture data during
high-load events and study patterns over time.
Our goal is to create a script that runs on a Solaris server, uses the
ps command (inspired by the alias topcu='ps -e -o pcpu,pid,user,tty,args | sort
-n -k 1 -r | head'), queries the Oracle database, and sends an HTML email with
a detailed report attached using mailx and uuencode. The script also addresses
a common issue: the “failed to generate prstat report” error from a previous
prstat-based version, ensuring reliability.
The Solution: A Robust Bash Script
The script below automates CPU load monitoring, focusing on high-load
events and providing actionable insights for Oracle EBS DBAs. It checks the
server’s load average, captures top processes, maps them to Oracle sessions,
and delivers results in an HTML email with a text attachment.
Script Overview
·
Load Check: Uses uptime to compare the 5-minute load
average against a threshold (1.2x CPU cores).
·
Top Processes: Captures the top 10 CPU-consuming processes
using ps, similar to the topcu alias.
·
Oracle Session Mapping: Queries v$session,
v$process, and v$sql to map PIDs to SIDs and retrieve session details and SQL
text.
·
HTML Email: Formats results in a clean HTML table for
easy reading.
·
Attachment: Includes a detailed text report via
uuencode.
·
Error Handling: Logs errors to diagnose issues like the
previous prstat failure.
The Script
bash
#!/bin/sh
# Set PATH for cron
PATH=/usr/bin:/usr/sbin:$PATH
# Check commands
for cmd in ps mailx uuencode uptime bc psrinfo sqlplus; do
if ! command -v "$cmd"
>/dev/null; then
echo "Error: $cmd not found" >&2
exit 1
fi
done
# Parameters
EMAIL_TO="dba_team@example.com"
EMAIL_SUBJECT="Solaris EBS CPU Load Analysis: $(date +"%Y-%m-%d %H:%M:%S")"
REPORT_FILE="/tmp/cpu_report_$$.txt"
LOG_FILE="/tmp/cpu_report_$$.log"
DB_USER="apps" #
Replace with your Oracle user
DB_PASS="apps_password" # Replace with password or use secure method
DB_SID="EBS" #
Replace with your Oracle SID
# Get CPU core count
CPU_CORES=$(psrinfo
-v | grep "on-line" | wc -l | awk '{print $1}')
LOAD_THRESHOLD=$(echo "$CPU_CORES * 1.2"
| bc) # 1.2x cores
# Log start
echo "Starting CPU load analysis at $(date)"
> "${LOG_FILE}"
echo "CPU Cores: $CPU_CORES, Load Threshold: $LOAD_THRESHOLD" >> "${LOG_FILE}"
# Check load average
LOAD_AVG=$(uptime
| awk '{print $(NF-2)}' | tr -d ',')
echo "Current Load Average: $LOAD_AVG"
>> "${LOG_FILE}"
if [ "$(echo "$LOAD_AVG <= $LOAD_THRESHOLD"
| bc)" -eq 1 ]; then
echo "Load ($LOAD_AVG) below threshold ($LOAD_THRESHOLD). Skipping report." >> "${LOG_FILE}"
exit 0
fi
# Run ps (inspired by topcu alias)
echo "High load detected ($LOAD_AVG). Running ps..." >> "${LOG_FILE}"
if ! ps -e -o
pcpu,pid,user,tty,args | sort -n -k 1 -r | head > "${REPORT_FILE}"
2>>"${LOG_FILE}"; then
echo "Error: ps command failed. See ${LOG_FILE}."
>&2
exit 1
fi
# Check report
if [ ! -s "${REPORT_FILE}" ]; then
echo "Error: ps report empty. See ${LOG_FILE}."
>&2
echo "ps output empty. Check permissions or system
state." >> "${LOG_FILE}"
exit 1
fi
# Extract PIDs for Oracle processes
PIDS=$(awk
'/oracle/ {print $2}' "${REPORT_FILE}" | sort -u)
# Query Oracle session details
echo "Querying Oracle sessions..." >> "${LOG_FILE}"
SQL_OUTPUT=$(sqlplus
-s /nolog <<EOF
2>>"${LOG_FILE}"
CONNECT ${DB_USER}/${DB_PASS}@${DB_SID}
SET PAGESIZE 0 LINESIZE 500
SET HEADING OFF
SET FEEDBACK OFF
SELECT 'SPID:' || p.spid || ':SID:' || s.sid || ':SERIAL#:' ||
s.serial# || ':USER:' || NVL(s.username, 'N/A') || ':PROGRAM:' ||
NVL(s.program, 'N/A') || ':MODULE:' || NVL(s.module, 'N/A') || ':SQL_TEXT:' ||
NVL(q.sql_text, 'No SQL found')
FROM v\$session s
JOIN v\$process p ON s.paddr = p.addr
LEFT JOIN v\$sql q ON s.sql_id = q.sql_id
WHERE p.spid IN ($(echo "$PIDS" | tr '\n' ',' | sed
's/,$//'))
ORDER BY s.sid;
EXIT
EOF
)
# Check SQL output
if grep -q "ERROR" "${LOG_FILE}";
then
echo "Error: SQL query failed. See ${LOG_FILE}."
>&2
rm -f "${REPORT_FILE}"
exit 1
fi
# Create detailed report file
{
echo "Timestamp: $(date
+"%Y-%m-%d %H:%M:%S")"
echo "Load Average: $LOAD_AVG (Threshold: $LOAD_THRESHOLD, Cores: $CPU_CORES)"
echo ""
echo "Top 10 CPU-Consuming Processes (ps):"
cat "${REPORT_FILE}"
echo ""
echo "Oracle Session Details:"
if [ -z "$SQL_OUTPUT" ]; then
echo "No Oracle sessions found for top processes."
else
echo "$SQL_OUTPUT" | sed 's/:/\n
/g'
fi
} >
"${REPORT_FILE}.tmp" && mv "${REPORT_FILE}.tmp"
"${REPORT_FILE}"
# Create HTML email body
HTML_BODY="
<html>
<head>
<style>
body { font-family: Arial, sans-serif; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left;
}
th { background-color: #f2f2f2; }
tr:nth-child(even) { background-color: #f9f9f9; }
</style>
</head>
<body>
<h2>Oracle EBS Solaris CPU Load Report</h2>
<p><strong>Timestamp:</strong> $(date +"%Y-%m-%d
%H:%M:%S")</p>
<p><strong>Load Average:</strong> $LOAD_AVG (Threshold: $LOAD_THRESHOLD,
Cores: $CPU_CORES)</p>
<h3>Top CPU-Consuming Processes</h3>
<table>
<tr><th>CPU%</th><th>PID</th><th>User</th><th>TTY</th><th>Process</th><th>SID</th><th>Serial#</th><th>DB
User</th><th>Program</th><th>Module</th><th>SQL
Text</th></tr>
"
# Parse ps and SQL output for HTML
awk '/oracle/ {print $1, $2, $3, $4, $5}' "${REPORT_FILE}" | while read -r cpu pid user tty proc; do
SQL_LINE=$(echo "$SQL_OUTPUT"
| grep "SPID:$pid:")
if [ -n "$SQL_LINE" ]; then
SID=$(echo "$SQL_LINE"
| cut -d: -f4)
SERIAL=$(echo "$SQL_LINE"
| cut -d: -f6)
DB_USER=$(echo "$SQL_LINE"
| cut -d: -f8)
PROGRAM=$(echo "$SQL_LINE"
| cut -d: -f10)
MODULE=$(echo "$SQL_LINE"
| cut -d: -f12)
SQL_TEXT=$(echo "$SQL_LINE"
| cut -d: -f14- | sed 's/"/"/g'
| cut -c1-100)
HTML_BODY="$HTML_BODY<tr><td>$cpu</td><td>$pid</td><td>$user</td><td>$tty</td><td>$proc</td><td>$SID</td><td>$SERIAL</td><td>$DB_USER</td><td>$PROGRAM</td><td>$MODULE</td><td>$SQL_TEXT</td></tr>"
else
HTML_BODY="$HTML_BODY<tr><td>$cpu</td><td>$pid</td><td>$user</td><td>$tty</td><td>$proc</td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td><td>No
SQL found</td></tr>"
fi
done
HTML_BODY="$HTML_BODY
</table>
<p>Attached is a detailed report ('cpu_report.txt') for
further analysis. Check Oracle sessions or concurrent managers causing
spikes.</p>
<p>Cheers,<br>EBS Monitor Team</p>
</body>
</html>"
# Send email
echo "Sending email..." >> "${LOG_FILE}"
if ! (echo "Content-Type:
text/html"; echo "$HTML_BODY"; uuencode "${REPORT_FILE}"
cpu_report.txt) | mailx -s "${EMAIL_SUBJECT}"
"${EMAIL_TO}" >>"${LOG_FILE}"
2>&1; then
echo "Error: Email failed. See ${LOG_FILE}."
>&2
rm -f "${REPORT_FILE}"
exit 1
fi
# Clean up
rm -f "${REPORT_FILE}"
echo "Report sent successfully!" >> "${LOG_FILE}"
How It Works
1.
Load Monitoring:
a. Checks the
5-minute load average using uptime.
b. Compares it
against a threshold (1.2x the number of CPU cores, determined via psrinfo).
c. Skips reporting
if the load is below the threshold, focusing on high-load events.
2.
Top CPU Processes:
a. Uses ps -e -o
pcpu,pid,user,tty,args | sort -n -k 1 -r | head to capture the top 10
processes, inspired by the topcu alias.
b. Filters for
Oracle processes (/oracle/) to extract relevant PIDs.
3.
Oracle Session Details:
a. Queries
v$session, v$process, and v$sql using sqlplus to map PIDs to SIDs and retrieve
session details (SID, Serial#, User, Program, Module) and SQL text.
b. Handles cases
where no Oracle sessions match the PIDs.
4.
HTML Email:
a. Formats results
in a professional HTML table with columns: CPU%, PID, User, TTY, Process, SID,
Serial#, DB User, Program, Module, SQL Text.
b. Uses CSS for
readability (bordered table, alternating row colors).
c. Limits SQL text
to 100 characters in the email for brevity.
5.
Attached Report:
a. Saves full ps
output and session details to cpu_report.txt, attached via uuencode for
detailed analysis.
6.
Error Handling:
a. Logs all steps to
/tmp/cpu_report_*.log to diagnose issues.
b. Checks for
command failures, empty outputs, and SQL errors.
c. Addresses the
previous “failed to generate prstat report” error by using ps instead of prstat
and ensuring file creation checks.
Addressing the “Failed to Generate prstat
Report” Error
The original script failed because prstat output was empty or the report
file couldn’t be created. By switching to ps (aligned with your topcu alias),
we avoid prstat-specific issues (e.g., permissions, output format). The script
includes:
·
Checks for ps success and non-empty output.
·
Verification of /tmp write permissions.
·
Detailed logging to pinpoint failures.
Setup Instructions
1.
Configure Database Access:
a. Update DB_USER,
DB_PASS, and DB_SID with your Oracle EBS credentials (e.g., apps user).
b. For security,
store credentials in a file:
bash
echo 'DB_USER=apps' > $HOME/.db_creds
echo 'DB_PASS=your_password' >> $HOME/.db_creds
echo 'DB_SID=EBS' >> $HOME/.db_creds
. $HOME/.db_creds
2.
Source EBS Environment (if needed):
bash
. $HOME/EBSapps.env RUN
3.
Run the Script:
bash
chmod
+x ebs_cpu_report.sh
./ebs_cpu_report.sh
4.
Schedule via Cron:
a. Run every 5
minutes to capture load spikes:
bash
*/5 *
* * * /path/to/ebs_cpu_report.sh
5.
Verify Output:
a. Check the log
file:
bash
cat
/tmp/cpu_report_*.log
b. Ensure emails are
received with the HTML table and attachment.
Debugging Issues
If the script fails (e.g., empty report or email issues):
·
Test ps:
bash
ps -e
-o pcpu,pid,user,tty,args | sort -n -k 1 -r | head
·
Test File Creation:
bash
echo "test" > /tmp/cpu_test_$$.txt
&& cat /tmp/cpu_test_$$.txt
·
Check Permissions:
bash
ls -ld
/tmp
df -h
/tmp
sudo
ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head
·
Test SQL:
bash
sqlplus
-s apps/apps_password@EBS <<EOF
SELECT 'SPID:' || p.spid || ':SID:' || s.sid FROM v\$session s,
v\$process p WHERE s.paddr = p.addr AND ROWNUM = 1;
EXIT
EOF
text
##
Analyzing Load Patterns
-
**Review HTML Emails**: The table highlights top Oracle processes and their
session details, making it easy to spot problematic SQL or EBS modules (e.g.,
concurrent managers).
-
**Save Attachments**: Store `cpu_report.txt` files in a shared directory for
historical analysis.
-
**Correlate with EBS**:
```sql
SELECT
r.request_id, r.program, r.actual_start_date, s.sid, s.sql_id
FROM
fnd_concurrent_requests r, v$session s
WHERE
r.oracle_session_id = s.sid
AND
r.actual_start_date >= SYSDATE - 1
ORDER
BY r.actual_start_date;
·
Identify Trends: Look for recurring high-CPU processes
(e.g., specific SQL or programs) during load spikes to optimize EBS
performance.
Conclusion
This script empowers Oracle EBS DBAs to monitor CPU load, identify
resource-intensive processes, and investigate Oracle sessions causing spikes,
all in a professional HTML email format. By using ps instead of prstat, it
avoids previous errors and aligns with your topcu alias. The attached detailed
report supports in-depth analysis, helping you maintain a stable Oracle EBS
environment.
If you encounter issues or need customizations (e.g., additional session
details, EBS-specific filters), share the log file (/tmp/cpu_report_*.log), ps
output, or Solaris version (uname -a), and I’ll provide tailored support. Happy
monitoring!
Comments
Post a Comment