EMCTL Script Empty

After several hours of trying to determine why when I ran ./emctl start oms absolute nothing happened.  I wasn’t even receiving an error message.  I set JAVA_HOME, OMS_HOME, ORACLE_HOME and yet nothing.  I checked ./emctl start agent and that worked without any issue, but not the emctl under the OMS home.  I was perplexed.  And then I saw metalink note How to Recreate the emctl Script for the Enterprise Manager OMS (Doc ID 1404623.1).  My file was indeed 0 bytes.  The note describes how to recreate the file.  I’ve since made a backup of the emctl for future reference.  Hopefully when I google “emctl start oms returns no information” this post will serve as a reminder.

Cluster Database and EM12c

Originally when we configured our cluster database on EM12c we used IP addresses simply because DNS wasn’t reliable in the sense setup.  Instead of fighting the battle we used IP addresses.  Which works provided you never change your IP addresses.  Three years later that’s exactly what we did.

During the configuration of the cluster database on EM12c, I realized I completely forgot how to make this happen.  So this is really a note to myself for the next time, (side note, DNS is internal and stable so we used the host names) I may have to re-configure the cluster database within EM12c.

The first step of course is to install the agent on all cluster host.  This is no different than any other host.

The second step adding the cluster database and high available is done through the manually add target.  Remember to select cluster database and not database instance.  The instances can’t be added until the cluster database.

After the configurations were complete, status remained pending for the cluster database and the targets all appeared down.  What I discovered within the configuration EM12c used the listener machine name as the host for connection string purposes, but that didn’t occur to me until I set specific connection string for both of the instances within the cluster database.  All that was required, change the listener machine name.

Why the listener machine name didn’t work?  It was the VIP fully qualified.  It should, but sadly the VIP fully qualified is not listed in the DNS nor the host file.  I don’t have access to either, and the length of time to have this resolved put me on a different path.  I changed the listener machine name to the host name.  Success.

I didn’t just leave it there, though I put in the proper work order to have the VIPs added to the DNS which is estimated at about 20 business days.  The work around provides us with EM working for the cluster database while we wait.

SQLSaturaday Orlando Take Aways

Primarily I work with Oracle which is more a circumstance of my jobs and the companies.  I’m always opened to all databases and systems.  While it’s much easier to stay current with Oracle since I have the the real life experiences daily the others provide a challenge.  One way I stay current on SQL Server is joining the PASS community, following some awesome SQL Server types on twitter and reading their blogs.  I also take advantage of their FREE SQLSaturdays in the local areas.  This past Saturday I spent the day immersed in SQL Sever in Orlando.

I really enjoy the SQL Saturdays they off so much information in one day.  I know that the local IOUGs usually host 1 day during the week that is very similar to the SQL Saturdays, but they tend to follow during the week and are more limited in their content.  Yes it does mean using my own personal time but if I want to stay current on any vendor product that is a sacrifice.  Plus its a few Saturdays out of the year depending on how many close by SQLSaturdays I desire to attend.

The content from this Saturdays event was varied and very well put together.  I enjoyed all the sessions.  I’m always performing comparisons in my mind as the content is presented between SQL Server and Oracle.  The session on Database Design Diasters was quite interesting.  The main take away:  Developers no matter of the platform make the same mistakes at least when using SQL Server and Oracle.

I found striking comparisons in two other sessions with regards to SQL Server vs Oracle.  Extend Events and assessing performance metrics.  SQL Server has come a very long way in this area but hands done Oracle does a much better job  collecting and displaying the metrics.  AWR, ASH, as well as the ability to dig into the tables underneath.  The wealth of data Oracle provides and ease of use wins every time.

Transaction logs session provided a unique comparison.  Logs provide the same functionality as the Redo Logs and the Undo Tablespace.  The manner in which the same concepts were strikingly uniquely implemented is worth it’s own blog post.  My immediately reaction is that Oracle does a better job in their implementation mainly because it’s automatically performing the backup — by archiving the log data to the archive logs.  But I’m hesitant because that might just a knee reaction and I want to explore the transaction logs for a deeper understanding — hence it’s own blog post.

I do recommend that any DBA take advantage of the community resources available to include any conferences, learning or networking opportunities.  As well as don’t limit yourself to only learning one vendor product within the database world.  Explore.  Be adventurous. This is how careers are taken to the next level.

Thank you to the SQLSaturday Orlando organizers and speakers.  It was a great experience.

OMS Not Starting After A Reboot Of The Host

After a recent outage where our OMS host was bounced, we found that OMS wouldn’t start.  The WebTier started without issue, but OMS didn’t.

:oracle:/u01/app/oracle/middleware/oms/bin >./emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting WebTier…
WebTier Successfully Started
Starting Oracle Management Server…
Oracle Management Server is Down

Checking the OMS log: /u01/app/oracle/middlware/gc_inst/em/EMGC_OMS1.log only provided me what I already knew
2014-03-05 10:26:56,571 [main] DEBUG oms.StatusOMSCmd processStatusOMS.239 – console page status code is 404

I hen checked the EMGC_OMS1.out log:  /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs/EMGC_OMS1.out

Which provided me with the following notice.  Apparently the sysman password was changed in the repository, but not in the credential store.

Repos details fetched from credstore
Fetched repository credentials from Credential Store
Invalid Connection Pool. ERROR = User credentials doesn’t match the existing ones
Failed to verify repository

We can change the sysman password in the repository by using emctl.

/u01/app/oracle/middlware/oms/bin/emctl config oms -store_repos_details -repos_host <host> -repos_port 1521 -repos_sid <SID> -repos_user SYSMAN -repos_pwd xxxxx

* Note replace the XXXXX after -repos_pwd with the actual password for your system, as well as the <host> with your host.

At this point I received an error message that I didn’t capture concerning the weblogic password being incorrect.  The next step is to change the weblogic password.

1.  /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/bin
. setDomainEnv.sh    — this will set the environment variables

2.  cd  /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/security

3.  cp DefaultAuthenticatorInit.ldift DefaultAuthenticatorInit.ldift.20140305  — this file will be changed

4.  java weblogic.security.utils.AdminAccount newAdmin newPassword .

* note there is a DOT at the end of the command line.  Ensure that the        JAVA_HOME and the CLASSPATH are set correctly

5.  cd /u01/app/oracle/middleware/gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER

6.  mv data to data_yyyymmdd

7.  cd security

8.  Modify the boot.properties file to have only these two lines:

password=newPassword

username=weblogic

9.  ./startEMServer.sh

Now I receive

 <Server failed                   to bind to the configured Admin port. The port may already be used by another pr                  ocess.>

So there apparently are already processes running that our bound to the same port.  A ps -ef|grep weblogic shows me the processes.  I kill the processes and run the start again this time successful.

Once the server comes up return to the change password for the SYSMAN and the OMS and this time it should be

And lastly I start oms ./emctl start oms

It takes a while and I was bombarded by alerts since oms was down for a few days but it did in fact startup without issues.

Still need to investigate why the the passwords appeared to have changed since last startup.

File Creation Issue DB_FILE_NAME_CONVERT

I hastily (which will always get me into trouble) created a file through EM12c.  It was Saturday before I was headed out for a much needed shopping spree.  I figured EM would allow efficiency but I failed to change the diskgroup location.  The default diskgroup just happened to not exist in the standby’s DB_FILE_NAME_CONVERT.

The DB_FILE_NAME_CONVERT is one of those magical parameters changing online is forbidden.  Oracle chose to create a file in it’s place under the $ORACLE_HOME/dbs directory named UNNAMED000036.  Since this was an ASM database that wasn’t going to work, well the file never was created but an entry was made in the controlfile.

The first step was to drop the UNNAMED0000036 file.   Since this was a physical standby the I had to use the drop option:

ALTER DATABASE DATAFILE ‘/u01/app/oracle/product/11.2.0.3/dbs/UNNAMED000036’ offline drop;

With the datafile gone I then created a pfile from the spfile:

CREATE PFILE FROM SPFILE;

Modified the DB_FILE_NAME_CONVERT within the spfile.  The apply process was stopped but I needed to shutdown the database and start it to mount using the new pfile:

SHUTDOWN ABORT;

STARTUP NOMOUNT PFILE=’/u01/app/oracle/product/11.2.0.3/dbs/initOracle.ora’;

ALTER DATABASE MOUNT;

Then start the apply process:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

We run with maximum performance, if needed you would restart the real time apply:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE:

I monitored the apply process.  Once the apply process caught up I then switched to using the spfile.

Now to revisit the alert thresholds within EM12c so we have heads up on the space issues before the weekend.

 

Backup Issue After Failover

We recently performed our first ever failover.  I’ll cover the actual steps of the failover including those that bite us because the standbys were built only with the protecting the data but never really using those databases. After I successfully failed the database over to the physical standby I immediately started a level 0 backup.  The backup ran with incident until the archivelogs.  That’s when I received the dreaded archivelog not found message:

The interesting thing about the message was the archivelog that the backup balked on — it wasn’t a log from the current primary but the previous primary.  Actually it was the most recent archivelog that was applied to the now current primary when it was the standby.  I decided the first step would be a crosscheck:

rman target / nocatalog

crosscheck archivelog all;

I noticed right away it started with a directory 2009 and slowly scrolled through about 70k of archivelogs including the most recent on from the previous primary.  The only archives that were found of course were those from the current primary.  No worries it knows the files don’t exist and marked them as such.  I started the archivelog backup again and it immediately failed same reason.  So this time I decided to run a delete expired and delete obsolete.

delete archivelog expired;

delete archivelog obsolete;

The backup once again scrolled through 70k plus archive logs received the same error message that they were not found.  Odd since these archivelogs contained the previous DBID and even showed they were the previous primary when reviewing the v$archived_log view.

I found the following note on My Oracle Support.  I first set out to uncatalog one by one each archive.  But as I quickly discovered that process would take forever even after I scripted the uncataloged.  I was hoping I could perform the uncatalog at the directory level after all it is possible to catalog a directory and all the archivelogs are then registered.  That would not be the case.  I actually had to uncatalog all the archivelogs following the note.  And then recataloged the archives.  When I recataloged I did by the directory.

RMAN target / nocatalog   — we choose the daring life using the controlfile instead of recovery catalog

catalog start with ‘/u02/app/oracle/archives’;

I restarted the backup and everyone is happy now.

RMAN DUPLICATE: Errors in krbm_getDupcopy

Spent the Christmas holiday rebuilding one of our physical standbys.  The process normally takes around 7 hours to complete.  This time around it pushed 48 hours plus.  While investigating I noticed messages in the alert log that I don’t recall from previous rebuilds:

RMAN DUPLICATE: Errors in krbm_getDupCopy
Errors in file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_ora_5426.trc:
ORA-19625: error identifying file +ORCL_DATA/orcl/datafile/users01.dbf
ORA-17503: ksfdopn:2 Failed to open file +orcl_DATA/ORCL/datafile/users01.dbf
ORA-15173: entry ‘users01.dbf’ does not exist in directory ‘datafile’

Oddly the duplicate process appeared still running:

[trace]> ps -ef|grep dup
oracle    4506 31237  0 10:12 pts/1    00:00:00 /bin/sh ./run_duplicate_orcl.sh

We utilize a script with nohup writing to a log file and a tail of the log file shows no errors:

[rebuild]> tail -f dup_orcl4.log

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting backup at 31-DEC-13
channel prmy1: starting datafile copy
input datafile file number=00013 name=/u01/app/oracle/oradata/PRIMARY/USERS01.DBF

An strace of the process shows it’s in wait:

[man]> strace -p 4506
Process 4506 attached – interrupt to quit
wait4(-1,

Since the process is working based on the current evidence, I turned to MOS as a last resort to understand the apparent error messages.  The answer appears in the following note:  1476641.1

This is another case of more information than is necessary.  To summarize it’s simply saying that the datafile doesn’t exist so a full copy has to be done.  I always drop the database which drops the files so I’m guessing I just noticed this warning message while trying to determine why the process is taking longer.  Since the files will never exist when I perform this process this is just noise and has no bearing on the slowness we are experiencing.