Automating CPU Load Monitoring for Oracle EBS on Solaris

 

Automating CPU Load Monitoring for Oracle EBS on Solaris. 1

The Challenge: High CPU Load in Oracle EBS. 1

The Solution: A Robust Bash Script. 2

Script Overview.. 2

The Script. 2

How It Works. 6

Addressing the “Failed to Generate prstat Report” Error. 7

Setup Instructions. 7

Debugging Issues. 8

Conclusion. 9

 

Automating CPU Load Monitoring for Oracle EBS on Solaris

As an Oracle E-Business Suite (EBS) DBA, managing high database server load is a critical task to ensure system performance and stability in a 24x7 environment. High CPU usage can stem from resource-intensive Oracle processes, concurrent managers, or user sessions, and identifying the culprits is key to optimizing performance. In this post, we’ll explore a Bash script that automates the monitoring of top CPU-consuming processes on a Solaris server, maps their process IDs (PIDs) to Oracle session IDs (SIDs), retrieves session details and SQL text, and sends the results in a professional HTML-formatted email. This solution is designed for Oracle EBS DBAs to analyze load patterns and troubleshoot performance issues efficiently.

The Challenge: High CPU Load in Oracle EBS

In an Oracle EBS environment, high CPU load can lead to slow response times, timeouts, or degraded user experience. To address this, DBAs need to:

·         Identify the top CPU-consuming processes during load spikes.

·         Map OS process IDs (PIDs) to Oracle session IDs (SIDs) to pinpoint database sessions.

·         Retrieve session details (e.g., username, program, module) and SQL text to investigate the root cause.

·         Automate this process to capture data during high-load events and study patterns over time.

Our goal is to create a script that runs on a Solaris server, uses the ps command (inspired by the alias topcu='ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head'), queries the Oracle database, and sends an HTML email with a detailed report attached using mailx and uuencode. The script also addresses a common issue: the “failed to generate prstat report” error from a previous prstat-based version, ensuring reliability.

The Solution: A Robust Bash Script

The script below automates CPU load monitoring, focusing on high-load events and providing actionable insights for Oracle EBS DBAs. It checks the server’s load average, captures top processes, maps them to Oracle sessions, and delivers results in an HTML email with a text attachment.

Script Overview

·         Load Check: Uses uptime to compare the 5-minute load average against a threshold (1.2x CPU cores).

·         Top Processes: Captures the top 10 CPU-consuming processes using ps, similar to the topcu alias.

·         Oracle Session Mapping: Queries v$session, v$process, and v$sql to map PIDs to SIDs and retrieve session details and SQL text.

·         HTML Email: Formats results in a clean HTML table for easy reading.

·         Attachment: Includes a detailed text report via uuencode.

·         Error Handling: Logs errors to diagnose issues like the previous prstat failure.

The Script

bash

#!/bin/sh

# Set PATH for cron

PATH=/usr/bin:/usr/sbin:$PATH

# Check commands

for cmd in ps mailx uuencode uptime bc psrinfo sqlplus; do

if ! command -v "$cmd" >/dev/null; then

echo "Error: $cmd not found" >&2

exit 1

fi

done

# Parameters

EMAIL_TO="dba_team@example.com"

EMAIL_SUBJECT="Solaris EBS CPU Load Analysis: $(date +"%Y-%m-%d %H:%M:%S")"

REPORT_FILE="/tmp/cpu_report_$$.txt"

LOG_FILE="/tmp/cpu_report_$$.log"

DB_USER="apps" # Replace with your Oracle user

DB_PASS="apps_password" # Replace with password or use secure method

DB_SID="EBS" # Replace with your Oracle SID

# Get CPU core count

CPU_CORES=$(psrinfo -v | grep "on-line" | wc -l | awk '{print $1}')

LOAD_THRESHOLD=$(echo "$CPU_CORES * 1.2" | bc) # 1.2x cores

# Log start

echo "Starting CPU load analysis at $(date)" > "${LOG_FILE}"

echo "CPU Cores: $CPU_CORES, Load Threshold: $LOAD_THRESHOLD" >> "${LOG_FILE}"

# Check load average

LOAD_AVG=$(uptime | awk '{print $(NF-2)}' | tr -d ',')

echo "Current Load Average: $LOAD_AVG" >> "${LOG_FILE}"

if [ "$(echo "$LOAD_AVG <= $LOAD_THRESHOLD" | bc)" -eq 1 ]; then

echo "Load ($LOAD_AVG) below threshold ($LOAD_THRESHOLD). Skipping report." >> "${LOG_FILE}"

exit 0

fi

# Run ps (inspired by topcu alias)

echo "High load detected ($LOAD_AVG). Running ps..." >> "${LOG_FILE}"

if ! ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head > "${REPORT_FILE}" 2>>"${LOG_FILE}"; then

echo "Error: ps command failed. See ${LOG_FILE}." >&2

exit 1

fi

# Check report

if [ ! -s "${REPORT_FILE}" ]; then

echo "Error: ps report empty. See ${LOG_FILE}." >&2

echo "ps output empty. Check permissions or system state." >> "${LOG_FILE}"

exit 1

fi

# Extract PIDs for Oracle processes

PIDS=$(awk '/oracle/ {print $2}' "${REPORT_FILE}" | sort -u)

# Query Oracle session details

echo "Querying Oracle sessions..." >> "${LOG_FILE}"

SQL_OUTPUT=$(sqlplus -s /nolog <<EOF 2>>"${LOG_FILE}"

CONNECT ${DB_USER}/${DB_PASS}@${DB_SID}

SET PAGESIZE 0 LINESIZE 500

SET HEADING OFF

SET FEEDBACK OFF

SELECT 'SPID:' || p.spid || ':SID:' || s.sid || ':SERIAL#:' || s.serial# || ':USER:' || NVL(s.username, 'N/A') || ':PROGRAM:' || NVL(s.program, 'N/A') || ':MODULE:' || NVL(s.module, 'N/A') || ':SQL_TEXT:' || NVL(q.sql_text, 'No SQL found')

FROM v\$session s

JOIN v\$process p ON s.paddr = p.addr

LEFT JOIN v\$sql q ON s.sql_id = q.sql_id

WHERE p.spid IN ($(echo "$PIDS" | tr '\n' ',' | sed 's/,$//'))

ORDER BY s.sid;

EXIT

EOF

)

# Check SQL output

if grep -q "ERROR" "${LOG_FILE}"; then

echo "Error: SQL query failed. See ${LOG_FILE}." >&2

rm -f "${REPORT_FILE}"

exit 1

fi

# Create detailed report file

{

echo "Timestamp: $(date +"%Y-%m-%d %H:%M:%S")"

echo "Load Average: $LOAD_AVG (Threshold: $LOAD_THRESHOLD, Cores: $CPU_CORES)"

echo ""

echo "Top 10 CPU-Consuming Processes (ps):"

cat "${REPORT_FILE}"

echo ""

echo "Oracle Session Details:"

if [ -z "$SQL_OUTPUT" ]; then

echo "No Oracle sessions found for top processes."

else

echo "$SQL_OUTPUT" | sed 's/:/\n /g'

fi

} > "${REPORT_FILE}.tmp" && mv "${REPORT_FILE}.tmp" "${REPORT_FILE}"

# Create HTML email body

HTML_BODY="

<html>

<head>

<style>

body { font-family: Arial, sans-serif; }

table { border-collapse: collapse; width: 100%; }

th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }

th { background-color: #f2f2f2; }

tr:nth-child(even) { background-color: #f9f9f9; }

</style>

</head>

<body>

<h2>Oracle EBS Solaris CPU Load Report</h2>

<p><strong>Timestamp:</strong> $(date +"%Y-%m-%d %H:%M:%S")</p>

<p><strong>Load Average:</strong> $LOAD_AVG (Threshold: $LOAD_THRESHOLD, Cores: $CPU_CORES)</p>

<h3>Top CPU-Consuming Processes</h3>

<table>

<tr><th>CPU%</th><th>PID</th><th>User</th><th>TTY</th><th>Process</th><th>SID</th><th>Serial#</th><th>DB User</th><th>Program</th><th>Module</th><th>SQL Text</th></tr>

"

# Parse ps and SQL output for HTML

awk '/oracle/ {print $1, $2, $3, $4, $5}' "${REPORT_FILE}" | while read -r cpu pid user tty proc; do

SQL_LINE=$(echo "$SQL_OUTPUT" | grep "SPID:$pid:")

if [ -n "$SQL_LINE" ]; then

SID=$(echo "$SQL_LINE" | cut -d: -f4)

SERIAL=$(echo "$SQL_LINE" | cut -d: -f6)

DB_USER=$(echo "$SQL_LINE" | cut -d: -f8)

PROGRAM=$(echo "$SQL_LINE" | cut -d: -f10)

MODULE=$(echo "$SQL_LINE" | cut -d: -f12)

SQL_TEXT=$(echo "$SQL_LINE" | cut -d: -f14- | sed 's/"/&quot;/g' | cut -c1-100)

HTML_BODY="$HTML_BODY<tr><td>$cpu</td><td>$pid</td><td>$user</td><td>$tty</td><td>$proc</td><td>$SID</td><td>$SERIAL</td><td>$DB_USER</td><td>$PROGRAM</td><td>$MODULE</td><td>$SQL_TEXT</td></tr>"

else

HTML_BODY="$HTML_BODY<tr><td>$cpu</td><td>$pid</td><td>$user</td><td>$tty</td><td>$proc</td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td><td>No SQL found</td></tr>"

fi

done

HTML_BODY="$HTML_BODY

</table>

<p>Attached is a detailed report ('cpu_report.txt') for further analysis. Check Oracle sessions or concurrent managers causing spikes.</p>

<p>Cheers,<br>EBS Monitor Team</p>

</body>

</html>"

# Send email

echo "Sending email..." >> "${LOG_FILE}"

if ! (echo "Content-Type: text/html"; echo "$HTML_BODY"; uuencode "${REPORT_FILE}" cpu_report.txt) | mailx -s "${EMAIL_SUBJECT}" "${EMAIL_TO}" >>"${LOG_FILE}" 2>&1; then

echo "Error: Email failed. See ${LOG_FILE}." >&2

rm -f "${REPORT_FILE}"

exit 1

fi

# Clean up

rm -f "${REPORT_FILE}"

echo "Report sent successfully!" >> "${LOG_FILE}"

How It Works

1.      Load Monitoring:

a.      Checks the 5-minute load average using uptime.

b.     Compares it against a threshold (1.2x the number of CPU cores, determined via psrinfo).

c.      Skips reporting if the load is below the threshold, focusing on high-load events.

2.      Top CPU Processes:

a.      Uses ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head to capture the top 10 processes, inspired by the topcu alias.

b.     Filters for Oracle processes (/oracle/) to extract relevant PIDs.

3.      Oracle Session Details:

a.      Queries v$session, v$process, and v$sql using sqlplus to map PIDs to SIDs and retrieve session details (SID, Serial#, User, Program, Module) and SQL text.

b.     Handles cases where no Oracle sessions match the PIDs.

4.      HTML Email:

a.      Formats results in a professional HTML table with columns: CPU%, PID, User, TTY, Process, SID, Serial#, DB User, Program, Module, SQL Text.

b.     Uses CSS for readability (bordered table, alternating row colors).

c.      Limits SQL text to 100 characters in the email for brevity.

5.      Attached Report:

a.      Saves full ps output and session details to cpu_report.txt, attached via uuencode for detailed analysis.

6.      Error Handling:

a.      Logs all steps to /tmp/cpu_report_*.log to diagnose issues.

b.     Checks for command failures, empty outputs, and SQL errors.

c.      Addresses the previous “failed to generate prstat report” error by using ps instead of prstat and ensuring file creation checks.

Addressing the “Failed to Generate prstat Report” Error

The original script failed because prstat output was empty or the report file couldn’t be created. By switching to ps (aligned with your topcu alias), we avoid prstat-specific issues (e.g., permissions, output format). The script includes:

·         Checks for ps success and non-empty output.

·         Verification of /tmp write permissions.

·         Detailed logging to pinpoint failures.

Setup Instructions

1.      Configure Database Access:

a.      Update DB_USER, DB_PASS, and DB_SID with your Oracle EBS credentials (e.g., apps user).

b.     For security, store credentials in a file:

bash

echo 'DB_USER=apps' > $HOME/.db_creds

echo 'DB_PASS=your_password' >> $HOME/.db_creds

echo 'DB_SID=EBS' >> $HOME/.db_creds

. $HOME/.db_creds

2.      Source EBS Environment (if needed):

bash

. $HOME/EBSapps.env RUN

3.      Run the Script:

bash

chmod +x ebs_cpu_report.sh

./ebs_cpu_report.sh

4.      Schedule via Cron:

a.      Run every 5 minutes to capture load spikes:

bash

*/5 * * * * /path/to/ebs_cpu_report.sh

5.      Verify Output:

a.      Check the log file:

bash

cat /tmp/cpu_report_*.log

b.     Ensure emails are received with the HTML table and attachment.

Debugging Issues

If the script fails (e.g., empty report or email issues):

·         Test ps:

bash

ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head

·         Test File Creation:

bash

echo "test" > /tmp/cpu_test_$$.txt && cat /tmp/cpu_test_$$.txt

·         Check Permissions:

bash

ls -ld /tmp

df -h /tmp

sudo ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head

·         Test SQL:

bash

sqlplus -s apps/apps_password@EBS <<EOF

SELECT 'SPID:' || p.spid || ':SID:' || s.sid FROM v\$session s, v\$process p WHERE s.paddr = p.addr AND ROWNUM = 1;

EXIT

EOF

text

## Analyzing Load Patterns

- **Review HTML Emails**: The table highlights top Oracle processes and their session details, making it easy to spot problematic SQL or EBS modules (e.g., concurrent managers).

- **Save Attachments**: Store `cpu_report.txt` files in a shared directory for historical analysis.

- **Correlate with EBS**:

```sql

SELECT r.request_id, r.program, r.actual_start_date, s.sid, s.sql_id

FROM fnd_concurrent_requests r, v$session s

WHERE r.oracle_session_id = s.sid

AND r.actual_start_date >= SYSDATE - 1

ORDER BY r.actual_start_date;

·         Identify Trends: Look for recurring high-CPU processes (e.g., specific SQL or programs) during load spikes to optimize EBS performance.

Conclusion

This script empowers Oracle EBS DBAs to monitor CPU load, identify resource-intensive processes, and investigate Oracle sessions causing spikes, all in a professional HTML email format. By using ps instead of prstat, it avoids previous errors and aligns with your topcu alias. The attached detailed report supports in-depth analysis, helping you maintain a stable Oracle EBS environment.

If you encounter issues or need customizations (e.g., additional session details, EBS-specific filters), share the log file (/tmp/cpu_report_*.log), ps output, or Solaris version (uname -a), and I’ll provide tailored support. Happy monitoring!

 

Comments

Popular posts from this blog

From Manual to Masterful: 25 Essential Prompts for Leading a Winning Oracle DBA Automation Project

scan