How to capture dump when intermittent High CPU happens on Azure Web App

 

When facing intermittent High CPU issue on Azure Web App, previously we have no easy method to capture dump files for further analysis because:

1. It is an intermittent issue. When you monitor the application, the high CPU may not happen at the moment or in the current process lifecycle.

2. When use Kudu site to run procdump to capture the High CPU dump, the procdump may exit before issue happens once Kudu console gets timeout/refresh, and then cause target process terminated.

Now with latest Crash Diagnoser (V1.1.0.0), we can quickly configure and capture the High CPU dump. Below is the guide:

1. Open Azure Portal, navigate to App Services -> your Web App -> Tools -> Extensions.

2. Click Add, and choose Crash Diagnoser to install it.

Note: After the Site Extension was installed (take around 10 seconds), the portal will restart App by default. You may want to do this installation during non-business hours if business requires.

clip_image001_thumb[1]

3. Please make sure the web app configuration is “AlwaysOn” because the Crash Diagnoser runs as a Continuous web job. Refer to: http://blog.amitapple.com/post/73574681678/git-deploy-console-app/#.VpcHaPl96Ul

4. After installation, select it from the Installed Web App Extension blade, click Browse:

clip_image002_thumb[1]

5. Click the 2nd Chance Unhandled Exception tab, choose your target process name. The default options include w3wp, node, php-cgi.

If you want to monitor other process, such as mywebjob.exe, just type the process name mywebjob manually in the field.

6. Select the Monitor CPU field as “Yes”. Set the CPU Threshold (the CPUs usage percentage to trigger dump action, the percentage is for all cores), and Duration (the CPU usage needs to be equal or above the threshold for the specific seconds). The summary info explained details.

clip_image004_thumb[1]

If you want to set other parameters, click the Advanced Settings

7. Click the Start button. Will see the title status changes:

clip_image005_thumb[1]

8. That’s All. If the target process meets the high CPU pattern, or crashed with unhandled exception, the useful dump will be generated immediately:

clip_image007_thumb[1]

9. Click the hyperlink of the files to download them, and you can analyze them with DebugDiag offline.

 

More Information Regarding Crash Diagnoser

===============================

How to capture intermittent exceptions on Azure Web APP

Troubleshoot Stack Overflow on Azure Web APP

Tips of Crash Diagnoser

 

Thanks.

Freist

(67)

How to use CrashDiag Site Extension to Capture Dump for Intermittent Exception issues or performance issues on Azure Web App

 

While running Customer’s .NET/PHP/ Node processes in Azure Web App environment, it may intermittently crashes due to code or performance issues. It’s important to capture the crash dump when such crash/exception happen automatically for further investigation.

This article is to introduce the new CrashDiag Site Extension, which can easily help us to capture the necessary data when intermittent unhandled exception happens. Now it can work for these scenarios:

Capture Dump for Exceptions

1.     Unhandled Exception happening

2.     Specific first chance exception happening based on filtering parameter

3.     Monitoring new launched processes continuously

4.     Configure total dump file numbers and debugger process instances

5.     Native App, Managed App, Web Job with 32bit/64bit

6.     Safe Stop at any time without terminating target process

Capture Dump for Performance (slow, high CPU, high Memory)

1. Capture Hang Dump on Demand

To capture dump for exceptions, follow below steps:

1. Install

2. Configure (1st chance or 2nd chance exception handling):

3. Start

4. Stop

Install

1. Open Azure Portal, navigate to App Services -> your Web App -> Tools -> Extensions.

2. Click Add, and choose Crash Diagnoser to install it.

image

3. Please make sure the web app configuration is “AlwaysOn” because the Crash Diagnoser runs as a Continuous web job. Refer to: http://blog.amitapple.com/post/73574681678/git-deploy-console-app/#.VpcHaPl96Ul

Configure

1. After installation, select it from the Installed Web App Extension blade, click Browse:

image

2. Now we can configure it based on our troubleshooting scenarios (1st chance or 2nd chance exception handling):

2.1 Application Crashed (unexpected terminated due to 2nd chance unhandled exception)

2.1.1 Click the 2nd Chance Unhandled Exception tab, choose your target process name. The default options include w3wp, node, php-cgi.

If you want to monitor other process, such as mywebjob.exe, just type the process name mywebjob manually in the field.

clip_image0074_thumb1

2.2 Application didn’t crash, but has specific exceptions generated. We need to filter 1st chance exception and perform analysis.

2.2.1 Click the 1st Chance Exception tab, set the exception code and process name.

Besides the default options of exception code, you can type other specific code which occurred on the application. For example, if the application reported FileNotFound exception, you can type FileNotFound or NotFound in the Exception Code field.

To monitor all 1st chance exceptions, type *

But this may generate other kind of dump files other than you expected.

image

3. Click the Advanced Settings tab to make sure the settings are properly for you. The current default setting are suitable for common scenarios, you can tune them as you wish.

Consider one dump can be hundreds MB size, you may want to set the Maximum dump file number less than 10.

clip_image0114_thumb1

Start

1. After configuration, click the Start button to start monitoring. If your application is running, around 10 seconds, you will see the CrashDiag status changed from “Stopped on Instance name” to “Monitoring Processes on Instance name”:

clip_image0134_thumb1

2. If the application crashed, you will see the *.dmp files are generated in the output path. In this sample, it is in d:\home\logfiles\crashdiag. Then You can open Kudu portal and further investigate the dump file.

Stop

1. After troubleshooting, click SafeStop to stop the Crash Diagnoser. This Stop action will not terminate the target process and can be executed at any time.

If the Instance is running a Crash Diag, and you click Start again, the portal will reminder you to safe stop it first:

clip_image0153_thumb1

 

To capture dump for performance issues, follow below steps:

1. Click the Process tab, click Load List to show current running processes

2. Click FullDump, the dump will be created and shows in the Dump File List section

clip_image0173_thumb2

Question & Answer

1. After install the Site Extension, will the Web App be restarted?

Yes. It will be restarted as applicationhost.xdt is used in the package. Without restarting, the route to CrashDiag cannot be found.

2. Can it analyze dump automatically?

Not Yet. Its current purpose is to capture intermittent exception data as dump files for root cause analysis.

3. If I set target process as W3WP, will it monitor the Kudu site W3WP process as well?

No. It will not capture processes which is running for Kudu, DaaSConsole, and DaaSRunner

4. If the site extension is updated, will it be auto-updated?

Yes. It will be auto-updated. A popup message will remind you for this:

clip_image0193_thumb1

Your web app will not be restarted during the auto-update.

5. If I don’t know which kind of exception I’m facing, how to configure it?

Please check if there is Process Terminated or Exit logs in the D:\home\LogFiles\eventlog.xml when the issue happened. If yes, then can use 2nd chance exception configure as above.

Normally 1st chance exception will not cause application crash. If you need to capture dump for it, please understand the performance impact on the target process can be a little higher than the 2nd chance exception capture. This is because the debugger needs to filer all 1st exceptions happen in the target process.

6. Will it monitor all Web Instances?

Through this UI, it will only monitor the current Instance in the current Kudu site (https://appname.scm.azurewebsites.net/crashdiag) instead of all instances.

The SafeStop option will stop all monitoring tasks on all instances.

7.  How to download the dump files?

Open Kudu portal, navigate to the output folder, and download them. Or download from CrashDiag file list directly by click the File link.

image

Thanks

-Freist

(30)

How to use CrashDiag to capture Stack Overflow exception dump in MVC Web APP on Azure

This article is to show how to capture a real stack overflow exception happened in MVC web app on Azure. Actually after reading you will find this method can be used to solve other web apps which has intermittent 1st chance exception issues.

The sample MVC web app is from a small test project from https://github.com/freistli/Dev14_Net46_Mvc5/tree/StackOverflow. When run it and click the Contact menu, the application will immediately exit and shows this error on Azure Web App:

clip_image002_thumb1

In this sample, we know it is a fatal c00000fd exception (native stackoverflow exception). If in real production situation cannot retrieve the code info from event log or other logs, can contact Microsoft Support Team to see if can get exit code from backend logging tables.

To further investigate the exception, we expect to take the crash dump exactly when the exception happens. Now let’s use CrashDiag:

1. Install Crash Diagnoser site extension from the Azure Site Extension Gallery from Azure Portal. For detailed installation steps please refer to this article.

Please make sure the web app configuration is “AlwaysOn” because the Crash Diagnoser runs as a Continuous web job. Refer to: http://blog.amitapple.com/post/73574681678/git-deploy-console-app/#.VpcHaPl96Ul

2. Browse the extension (https://yourtestapp.scm.azurewebsites.net/cashdiag). Click the 1st chance exception tab, set the setting as:

image_thumb7

Please notice The Managed Exception option is selected as “No” because it is a crash on native exception code c00000fd

The reason we don’t configure and capture 2nd chance unhandled exception for the MVC app is ASP.NET by default captures unhandled 1st chance exceptions in HTTP Context directly and makes an exit. It quits directly after those “fatal” 1st chance exceptions without throwing 2nd chance exception.

If you saw StackOverflowException exception code was recorded in d:\home\logfiles\eventlog.xml, it means the exception is managed exception:

<Data>w3wp.exe</Data>
<Data>IIS APPPOOL\TestWeb</Data>
<Data>StackOverflowException</Data>
<Data>Operation caused a stack overflow.
at testWeb._Default.Page_Load(Object sender, EventArgs e)
at System.Web.UI.Control.OnLoad(EventArgs e)
at System.Web.UI.Control.LoadRecursive()

Then you should set exception code as “overflow” or “StackOverflowException”, Managed Exception as “Yes”:

image_thumb5

3. Click the Start button. The CrashDiag starts monitoring w3wp process.

image_thumb9

4. Once the c00000fd exception happens, the dump can be generated automatically. After dump is generated completedly (the file size doesn’t change for 10 seconds), click the file name to download it.

image_thumb11

Then we can use debugging tool (Windbg or DebugDiag) to debug the dump to see how the exception happens. This sample dump shows calls tack as below, then it accurately points to the Foo() function caused the process termination:

0:22:> kL

ChildEBP RetAddr

059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()
059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()+0x6
059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()+0x6
059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()+0x6
059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()+0x6
059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()+0x6
059ce610 067755f6 Dev14_Net46_Mvc5!Dev14_Net46_Mvc5.Controllers.HomeController.Foo()+0x6
private void Foo()
        {
       
         Foo();
        
        }

Thanks

-Freist

(17)

Tips of using Crash Diagnoser on Azure Web App

After Crash Diagnoser was released, glad to see it does help users and support team to collect expected dump data to perform further analysis. If you haven’t touch it before, can read the previous posts here:

http://blogs.msdn.com/b/asiatech/archive/2015/12/29/use-crashdiag-site-extension-to-capture-dump-for-intermittent-exception-issues-on-microsoft-azure-web-app.aspx

http://blogs.msdn.com/b/asiatech/archive/2016/01/12/how-to-use-crashdiag-to-capture-stack-overflow-exception-dump-in-mvc-web-app-on-microsoft-azure.aspx

Actually the problem that Crash Diagnoser is going to resolve is in an advanced troubleshooting area (Debugging and Exception capture). Based on the users’ feedback and usage experience, I’d like to share some tips so that the extension can help you easier during Crash/Hang problem investigation:

1. Ensure the web app Application Setting “Always On” is enabled (So your Web APP service plan should be Basic or above)

clip_image001

The reason is Crash Diagnoser runs as continuous web job on backend.

Without “Always On”, after 20 minutes idle time the SCM kudu site will be unloaded, and the web job will be stopped.

2. When your ASP.NET application terminated unexpectedly, normally you have to use 1st chance exception capture because:

Since ASP.NET 2.0, ASP.NET by default will handle all unhandled 1st chance exceptions thrown from HTTP Context and make the process exit.

This means you have no chance to see 2nd chance exception unless the unhandled 1st chance exception doesn’t happen in HTTP Context

3. If the exception code is .NET exception (generally you will see its terms in d:\home\logfiles\eventlog.xml, such as NullReferenceException, FileNotFoundException, TimeoutException, etc), when configure the 1st chance exception, please make sure the “Managed Exeption” is marked as Yes:

clip_image003

4. After troubleshooting, click “Safe Stop” will almost stop the Crash Diagnoser (it will be waked up every 60 seconds but doesn’t do any task and then quit). To completely stop it, can open Web Job list under the Web APP, and explicitly stop it. In the WebJobs blade, right click and stop CrashHelper (CrashDiagnoser package).

After this operation, if you want to start Crash Diagnoser again through (https://yourwebapp.scm.azurewebsites.net/crashdiag ), please start this job explicitly first.

clip_image005

5. For php and node.js exceptions, they are their “managed” exceptions and are not exposed as native exceptions handled by Kernet32!RaiseException. For those third party “Managed” exceptions, can use their verbose error logging to do further diagnostics. If the crash/exception happened on their product level and no useful information their logs, can use Crash Diagnoser to capture the issues.

Thanks,

Freist

(53)