Category Archives: Oracle

OMS Not Starting After A Reboot Of The Host

After a recent outage where our OMS host was bounced, we found that OMS wouldn’t start.  The WebTier started without issue, but OMS didn’t.

:oracle:/u01/app/oracle/middleware/oms/bin >./emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting WebTier…
WebTier Successfully Started
Starting Oracle Management Server…
Oracle Management Server is Down

Checking the OMS log: /u01/app/oracle/middlware/gc_inst/em/EMGC_OMS1.log only provided me what I already knew
2014-03-05 10:26:56,571 [main] DEBUG oms.StatusOMSCmd processStatusOMS.239 – console page status code is 404

I hen checked the EMGC_OMS1.out log:  /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs/EMGC_OMS1.out

Which provided me with the following notice.  Apparently the sysman password was changed in the repository, but not in the credential store.

Repos details fetched from credstore
Fetched repository credentials from Credential Store
Invalid Connection Pool. ERROR = User credentials doesn’t match the existing ones
Failed to verify repository

We can change the sysman password in the repository by using emctl.

/u01/app/oracle/middlware/oms/bin/emctl config oms -store_repos_details -repos_host <host> -repos_port 1521 -repos_sid <SID> -repos_user SYSMAN -repos_pwd xxxxx

* Note replace the XXXXX after -repos_pwd with the actual password for your system, as well as the <host> with your host.

At this point I received an error message that I didn’t capture concerning the weblogic password being incorrect.  The next step is to change the weblogic password.

1.  /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/bin
. setDomainEnv.sh    — this will set the environment variables

2.  cd  /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/security

3.  cp DefaultAuthenticatorInit.ldift DefaultAuthenticatorInit.ldift.20140305  — this file will be changed

4.  java weblogic.security.utils.AdminAccount newAdmin newPassword .

* note there is a DOT at the end of the command line.  Ensure that the        JAVA_HOME and the CLASSPATH are set correctly

5.  cd /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER

6.  mv data to data_yyyymmdd

7.  cd security

8.  Modify the boot.properties file to have only these two lines:

password=newPassword

username=weblogic

9.  ./startEMServer.sh

Now I receive

 <Server failed                   to bind to the configured Admin port. The port may already be used by another pr                  ocess.>

So there apparently are already processes running that our bound to the same port.  A ps -ef|grep weblogic shows me the processes.  I kill the processes and run the start again this time successful.

Once the server comes up return to the change password for the SYSMAN and the OMS and this time it should be

And lastly I start oms ./emctl start oms

It takes a while and I was bombarded by alerts since oms was down for a few days but it did in fact startup without issues.

Still need to investigate why the the passwords appeared to have changed since last startup.

File Creation Issue DB_FILE_NAME_CONVERT

I hastily (which will always get me into trouble) created a file through EM12c.  It was Saturday before I was headed out for a much needed shopping spree.  I figured EM would allow efficiency but I failed to change the diskgroup location.  The default diskgroup just happened to not exist in the standby’s DB_FILE_NAME_CONVERT.

The DB_FILE_NAME_CONVERT is one of those magical parameters changing online is forbidden.  Oracle chose to create a file in it’s place under the $ORACLE_HOME/dbs directory named UNNAMED000036.  Since this was an ASM database that wasn’t going to work, well the file never was created but an entry was made in the controlfile.

The first step was to drop the UNNAMED0000036 file.   Since this was a physical standby the I had to use the drop option:

ALTER DATABASE DATAFILE ‘/u01/app/oracle/product/11.2.0.3/dbs/UNNAMED000036’ offline drop;

With the datafile gone I then created a pfile from the spfile:

CREATE PFILE FROM SPFILE;

Modified the DB_FILE_NAME_CONVERT within the spfile.  The apply process was stopped but I needed to shutdown the database and start it to mount using the new pfile:

SHUTDOWN ABORT;

STARTUP NOMOUNT PFILE=’/u01/app/oracle/product/11.2.0.3/dbs/initOracle.ora’;

ALTER DATABASE MOUNT;

Then start the apply process:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

We run with maximum performance, if needed you would restart the real time apply:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE:

I monitored the apply process.  Once the apply process caught up I then switched to using the spfile.

Now to revisit the alert thresholds within EM12c so we have heads up on the space issues before the weekend.

 

Backup Issue After Failover

We recently performed our first ever failover.  I’ll cover the actual steps of the failover including those that bite us because the standbys were built only with the protecting the data but never really using those databases. After I successfully failed the database over to the physical standby I immediately started a level 0 backup.  The backup ran with incident until the archivelogs.  That’s when I received the dreaded archivelog not found message:

The interesting thing about the message was the archivelog that the backup balked on — it wasn’t a log from the current primary but the previous primary.  Actually it was the most recent archivelog that was applied to the now current primary when it was the standby.  I decided the first step would be a crosscheck:

rman target / nocatalog

crosscheck archivelog all;

I noticed right away it started with a directory 2009 and slowly scrolled through about 70k of archivelogs including the most recent on from the previous primary.  The only archives that were found of course were those from the current primary.  No worries it knows the files don’t exist and marked them as such.  I started the archivelog backup again and it immediately failed same reason.  So this time I decided to run a delete expired and delete obsolete.

delete archivelog expired;

delete archivelog obsolete;

The backup once again scrolled through 70k plus archive logs received the same error message that they were not found.  Odd since these archivelogs contained the previous DBID and even showed they were the previous primary when reviewing the v$archived_log view.

I found the following note on My Oracle Support.  I first set out to uncatalog one by one each archive.  But as I quickly discovered that process would take forever even after I scripted the uncataloged.  I was hoping I could perform the uncatalog at the directory level after all it is possible to catalog a directory and all the archivelogs are then registered.  That would not be the case.  I actually had to uncatalog all the archivelogs following the note.  And then recataloged the archives.  When I recataloged I did by the directory.

RMAN target / nocatalog   — we choose the daring life using the controlfile instead of recovery catalog

catalog start with ‘/u02/app/oracle/archives’;

I restarted the backup and everyone is happy now.

RMAN DUPLICATE: Errors in krbm_getDupcopy

Spent the Christmas holiday rebuilding one of our physical standbys.  The process normally takes around 7 hours to complete.  This time around it pushed 48 hours plus.  While investigating I noticed messages in the alert log that I don’t recall from previous rebuilds:

RMAN DUPLICATE: Errors in krbm_getDupCopy
Errors in file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_ora_5426.trc:
ORA-19625: error identifying file +ORCL_DATA/orcl/datafile/users01.dbf
ORA-17503: ksfdopn:2 Failed to open file +orcl_DATA/ORCL/datafile/users01.dbf
ORA-15173: entry ‘users01.dbf’ does not exist in directory ‘datafile’

Oddly the duplicate process appeared still running:

[trace]> ps -ef|grep dup
oracle    4506 31237  0 10:12 pts/1    00:00:00 /bin/sh ./run_duplicate_orcl.sh

We utilize a script with nohup writing to a log file and a tail of the log file shows no errors:

[rebuild]> tail -f dup_orcl4.log

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting backup at 31-DEC-13
channel prmy1: starting datafile copy
input datafile file number=00013 name=/u01/app/oracle/oradata/PRIMARY/USERS01.DBF

An strace of the process shows it’s in wait:

[man]> strace -p 4506
Process 4506 attached – interrupt to quit
wait4(-1,

Since the process is working based on the current evidence, I turned to MOS as a last resort to understand the apparent error messages.  The answer appears in the following note:  1476641.1

This is another case of more information than is necessary.  To summarize it’s simply saying that the datafile doesn’t exist so a full copy has to be done.  I always drop the database which drops the files so I’m guessing I just noticed this warning message while trying to determine why the process is taking longer.  Since the files will never exist when I perform this process this is just noise and has no bearing on the slowness we are experiencing.

 

 

 

 

ORA-16224 Database Guard Is Enabled

Recently encountered this error after a roll out in production.  Several hundred procedures were invalid on the logical standby.  I was unable to recompile and received an ORA-16224 Database Guard is enabled error.

I checked the guard_status in v$database.  Usually this error is thrown when it is set to ALL, however, in this case it was set to Standby.  I was interested in disabling the guard which could be done with:

alter database guard none;   < other options include standby, which it was already set to and all, which wouldn’t have worked>

However, this can also be set at the session level:

alter session disable guard;

That turns it off for my session.  I wiped up a quick anonymous block to recompile all invalid procedures.

declare

sSQL varchar2(4000);

begin

for i in (select object_name from

dba_objects where owner = ‘MY_SCHEMA’

and status = ‘INVALID’

and object_type = ‘PROCEDURE’)

loop

sSQL:=’alter procedure my_schema.’||i.object_name||’ compile’;

execute immediate sSQL;

end loop;

end;

/

Once that executed successfully and just turned the guard status back to standby in my session:

alter session enable guard;

Had the end user attempt to run report again.  Success.

Instantiate A Table On Logical Standby

Seems today is the day for logical standby issues.  Two tables received a conflict, which we have suffered with previously and applied a patch.  It’s interesting we started receiving this again, however, I’m going to skip to how I resolved the immediate issue syncing the logical objects and allowing the SQL Apply to continue.  Will need a MOS for the other issue.

I started with trying to use DBMS_LOGSTDBY.INSTANTIATE_TABLE, which is nice and convenient since it uses a dblink — no moving files around.

EXECUTE DBMS_LOGSTDBY.INSTANTIATE_TABLE(schema_name=>’MySchema’, table_name=>’TABLE1′, dblink=>’mydb’);

An error is thrown issue with a reference constraint.  Problem the parent table is out of sync as well.

EXECUTE DBMS_LOGSTDBY.INSTANTIATE_TABLE(schema_name=>’MySchema’, table_name=>’PARENT1′,dblink=>’mydb’);

An error is thrown due to the PARENT1 having a LONG datatype column.

So now I’m forced to move files around using datapump.  I already have datapump directories defined on both databases so I won’t cover that task.

These tables are not updated frequently usually only throw a weekly file load.

alter database stop logical standby apply;

expdp system dumpfile=logical.dmp directory=data_pump_dir tables=myschema.parent1, myschema.table1

Once completed I move the dumpfile over to the logical database server under the appropriate directory.

impdp system dumpfile=logical.dmp directory=data_pump_dir tables=myschema.parent1, myschema.table1

alter database start logical standby apply immediate;

Then I watched the SQL Apply process close the lag.

Instance Using SPFILE But File Doesn’t Exist

Recently I’ve been playing Tetris with my databases.  Duplicating databases and reconciling differences due to a change management system with significant gaps.  During one of these fun moments I ended up with a database using a SPFILE, but the SPFILE didn’t actually exist.  Simple fix:

create pfile from memory;

All was well as it saved me from having to re-create the pfile using the alert.log.  The SPFILE can be created this way as well provided you are using a pfile:

create spfile from memory;

Every minute I can save with little tidbits such as the above is precious.

Relinking Grid Infrastucture Binaries

I recently ran into several issues while installing the 11.2.0.2 GI.  Most of these issues were due to the admins designing the system that caused several installs and uninstalls.  On the final install everything appeared to be in order until I tried to start ASMCA to add diskgroups.  I received the error message an earlier version of ASM was running and in order to upgrade I needed to start ASMCA for the earlier version.  I only ever had one version installed so this was a faulty error message.

After reviewing the system I decided to relink the GI binaries:

As ROOT:

cd $GID_HOME/crs/install

perl rootcrs.pl -unlock — this stops the clusterware and sets the permissions for root

As the Grid Infrastructure Home

$GRID_HOME/bin/relink

as ROOT:

cd GRID_HOME/rdbms/install

./rootadd_rdbms.sh

cd GRID_HOME/crs/install

perl rootcrs.pl -patch  — Restarts the Clusterware and sets permissions

There is a bug that you may encountered with rootcrs.pl.  I overcame the issue by running root.sh again from the Grid home as the root user.

Once completed ASMCA started without issue.

Agent Unable to Communicate with OMS EM12c

My phone started going crazy with alerts from EM12c, apparently all of my agents had lost communication with OMS.  I immediately started checking all of the logs on OMS, not finding anything of interest I decided to try an upload from one of the host:

./emctl upload agent

WARN – Ping communication error
o.s.emSDK.agent.comm.exception.ConnectException [Failure connecting to https://HOST.domain.com:4889/empbs/upload , err host.domain.com]

./emctl pingOMS

Can’t find host host.domain.com

Interesting, but makes sense after all the only agent able to communicate to the OMS actually resides on the same host.

ping host.domain.com

Same results

After an email exchange with the Linux administration some changes had occurred in DNS that needed to be reverted.  Once that had been completed the ping worked and all agents showed as up.

 

 

ORA-31634 Unable to Construct Unique Job Name by Default

It never fails, every time I have a quick request that I should be able to hammer out in a few minutes something hampers it.  Today is just one of those days.  I needed to perform an expdp/impdp on a small schema between development and testing and I received an error:

ORA-31634 Unable To Construct Unique Job Name By Default

I generally always default the job name on data pumps exports especially if they are one offs.  However, today this just didn’t cooperate.  Apparently we have experienced some failures with other exports running and the tables still exist.  There’s a limit of 99.  An easy check:

select owner_name,state,job_name from dba_datapump_jobs;

If the number of records returned equal 99 then simple delete the tables.  SYS_EXPORT_SCHEMASxx;  — replacing the xx with the number displayed in the output from above.

Restart the export.