Question 1
Difficulty: easy
How do you approach monitoring and maintaining the health of an Oracle database in production?
Sample answer
I treat database health as a mix of prevention, visibility, and fast response. My first step is to make sure I have solid monitoring in place for the usual pressure points: CPU, memory, I/O latency, wait events, tablespace usage, redo generation, backup status, and session activity. I like to build a baseline so I can spot drift before it becomes an outage. On a daily basis, I review AWR/ASH reports, alert logs, failed jobs, and any growth trends that could affect capacity. I also check for fragmented execution plans, blocking sessions, and long-running queries that may need attention. If I see a pattern, I document the root cause and adjust thresholds or tuning settings so it does not repeat. In production, I try to be proactive rather than reactive, because the best DBA work is often the problem the users never notice.
Question 2
Difficulty: medium
Describe a time you had to resolve a performance issue in Oracle. What did you do?
Sample answer
In one environment, users reported that a core application had become slow during peak hours, but the slowdown was not constant. I started by comparing AWR reports from normal and peak periods, then used ASH data to identify that the top wait event was related to direct path reads and a small group of SQL statements was consuming most of the resources. I checked execution plans and found that one query had started doing full table scans after data growth changed the optimizer’s estimates. Rather than guessing, I validated the stats, reviewed indexes, and tested a plan change in a lower environment. We added a missing composite index, refreshed statistics, and applied a SQL Plan Baseline to stabilize the execution path. After deployment, response times dropped significantly and the issue did not recur. What I learned from that case was to always follow evidence, not assumptions, when diagnosing performance problems.
Question 3
Difficulty: medium
How do you manage backup and recovery strategies for Oracle databases?
Sample answer
I design backup and recovery around the business recovery target, not just around technical convenience. First I confirm the RPO and RTO with the application owners, because that tells me whether I need simple RMAN backups or a more resilient setup with archivelog shipping, standby databases, or flashback capabilities. For most critical systems, I use RMAN with regular full and incremental backups, archived redo log backups, and frequent restore validation. I never consider a backup reliable until I have tested recovery. I also make sure backup scripts handle errors cleanly, write clear logs, and alert us if a job fails. On the recovery side, I keep procedures documented for point-in-time recovery, tablespace recovery, control file loss, and accidental data changes. In practice, the best recovery plan is the one that has already been rehearsed. That way, if something goes wrong at 2 a.m., we are restoring calmly instead of improvising.
Question 4
Difficulty: hard
What steps would you take if an application suddenly started getting ORA-01555 snapshot too old errors?
Sample answer
When I see ORA-01555, I assume there is a mismatch between undo retention and the workload until I prove otherwise. My first step is to identify which queries are failing and whether the issue is happening during long-running reports, batch jobs, or peak OLTP activity. Then I check undo tablespace usage, retention settings, and whether the system is experiencing heavy DML that is overwriting undo too quickly. I also review execution plans, because inefficient queries can hold consistent read snapshots longer than necessary. If the issue is tied to a specific process, I look for ways to reduce transaction size, optimize the query, or schedule the job during a quieter window. If needed, I may increase undo tablespace size or adjust retention based on actual workload patterns. I avoid treating the symptom alone. The goal is to balance the undo configuration with the application behavior so the same error does not keep returning.
Question 5
Difficulty: medium
Tell me about a time you had to handle a database outage or critical incident.
Sample answer
During one incident, a production database became unavailable after storage latency spiked and several sessions began hanging. I immediately focused on stabilizing the system rather than trying to solve everything at once. I checked the alert log, active sessions, and system wait events to confirm that the database itself was not corrupted and that the issue was infrastructure-related. I then coordinated with the storage team to isolate the affected volume and reduce the load on the server. While that was happening, I communicated status updates to application support and business stakeholders so they knew we were working on it and had an estimated timeline. Once the storage problem was corrected, I verified instance health, checked for recovery requirements, and monitored the system closely during the return to normal workload. Afterward, I led a review to document the timeline and add earlier warning thresholds. I believe strong incident handling is about calm prioritization, good communication, and making sure the root cause is understood.
Question 6
Difficulty: medium
How do you handle Oracle patching and version upgrades with minimal downtime?
Sample answer
I approach patching and upgrades as change projects, not just technical tasks. I start by reviewing the release notes, known issues, compatibility requirements, and whether the patch affects database binaries, ASM, Grid Infrastructure, or client components. Then I build a detailed plan with rollback steps, validation checks, and a realistic downtime estimate. For critical systems, I prefer using a staging environment that mirrors production so I can test application behavior, performance, and startup procedures before the real change window. If the architecture supports it, I also look at options like Data Guard switchover or rolling maintenance to reduce downtime. During the execution window, I keep communication tight and verify each checkpoint before moving forward. After patching, I validate object status, listener connectivity, application login, and core transactions. I have found that most upgrade problems come from poor preparation, not the upgrade itself. Careful rehearsal and documentation make the process much safer and smoother.
Question 7
Difficulty: medium
How do you troubleshoot a slow Oracle query without immediately changing the application code?
Sample answer
I start by gathering evidence so I understand whether the problem is the SQL itself, the execution plan, or the environment around it. I look at the SQL ID, the plan history, row source statistics, and whether the query is new or has recently changed in performance. Then I compare estimated versus actual row counts to see if the optimizer is making a bad assumption. If needed, I check whether statistics are stale, whether bind variable peeking is influencing the plan, or whether an index is missing or not being used effectively. I also pay attention to system-level factors like CPU pressure, I/O waits, and blocking sessions, because a query may appear slow even if the SQL is fine. If I can improve performance through statistics, indexing, hints, or a plan baseline, I will do that first. I only recommend application changes when the database-side options have been exhausted and the evidence clearly points there.
Question 8
Difficulty: easy
What is your process for managing users, roles, and database security in Oracle?
Sample answer
I treat database security as a structured process, not just a checklist. I start with least privilege, giving users only the access they need to perform their job. That means using roles where possible, separating application schemas from human accounts, and avoiding broad privileges like DBA unless there is a strong reason. I also review password policies, account lock settings, auditing requirements, and whether any sensitive data needs additional protection. For applications with regulated data, I pay attention to encryption at rest, network encryption, and auditing of privileged actions. I prefer documenting access requests and approvals so there is a clear trail for audits and internal reviews. When roles change or employees leave, I make sure access is removed quickly and cleanly. Security is not just about blocking threats; it is also about reducing operational risk. A well-controlled Oracle environment is easier to support, easier to audit, and much less likely to have surprises during a compliance review.
Question 9
Difficulty: easy
How do you work with developers and system administrators when database issues affect the business?
Sample answer
I try to make myself a translator between technical layers, because many database problems involve more than one team. When an issue comes up, I want developers, sysadmins, storage engineers, and sometimes network or middleware teams working from the same facts. I start by defining the symptom clearly and separating it from assumptions. Then I share relevant evidence in a way each team can use: SQL details for developers, wait events and system metrics for infrastructure teams, and business impact for leadership. I have found that tone matters a lot in these situations. If the conversation becomes defensive, people stop solving the issue. I focus on facts, urgency, and next steps. After the problem is fixed, I like to hold a short review so we can prevent repeat incidents and improve handoffs. Good collaboration has saved me more time than any tuning trick, because many issues are really coordination problems disguised as technical ones.
Question 10
Difficulty: easy
If you were given responsibility for a new Oracle environment, what would be your first priorities in the first 30 days?
Sample answer
My first 30 days would focus on understanding the environment, reducing risk, and building trust with the team. I would begin by learning the architecture: versions, patch levels, backup strategy, standby or replication setup, storage layout, security model, and the most important applications supported by the database. Then I would review monitoring, alerting, and recent incidents so I can spot any weak points. I would also validate that backups can actually be restored, because that is one of the biggest hidden risks in many environments. After that, I would look at capacity trends, top resource-consuming SQL, and any recurring performance complaints. I like to identify quick wins, but I avoid making major changes before I understand the system well. At the same time, I would meet with stakeholders to learn their priorities and pain points. The goal in the first month is not to prove I know everything immediately. It is to become useful quickly, safely, and with a clear plan.