<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>async profiler - Deployment Every Day</title>
	<atom:link href="http://deploymenteveryday.com/tag/async-profiler/feed/" rel="self" type="application/rss+xml" />
	<link>http://deploymenteveryday.com</link>
	<description>JVM (java/kotlin) performance, programming, frequent deployments, devops</description>
	<lastBuildDate>Wed, 27 Dec 2023 15:06:57 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Example of 40x speedup using async-profiler recording analysis (JFR file)</title>
		<link>http://deploymenteveryday.com/example-of-40x-speedup-using-async-profiler-recording-analysis-jfr-file/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=example-of-40x-speedup-using-async-profiler-recording-analysis-jfr-file</link>
					<comments>http://deploymenteveryday.com/example-of-40x-speedup-using-async-profiler-recording-analysis-jfr-file/#respond</comments>
		
		<dc:creator><![CDATA[Mikolaj Grzaslewicz]]></dc:creator>
		<pubDate>Wed, 27 Dec 2023 15:06:56 +0000</pubDate>
				<category><![CDATA[performance]]></category>
		<category><![CDATA[async profiler]]></category>
		<category><![CDATA[bottleneck analysis]]></category>
		<category><![CDATA[flamegraph]]></category>
		<category><![CDATA[JFR]]></category>
		<category><![CDATA[performance improvement]]></category>
		<guid isPermaLink="false">https://deploymenteveryday.com/?p=76</guid>

					<description><![CDATA[<p>40x performance improvement in this example is series of a few improvements combined. I was trying to check what can be improved during working on a small project. As usual, I run intellij profiler from time to time to see the hottest methods. And that was part of my development workflow aside from TDD. 2x [&#8230;]</p>
<p>The post <a href="http://deploymenteveryday.com/example-of-40x-speedup-using-async-profiler-recording-analysis-jfr-file/">Example of 40x speedup using async-profiler recording analysis (JFR file)</a> first appeared on <a href="http://deploymenteveryday.com">Deployment Every Day</a>.</p>]]></description>
										<content:encoded><![CDATA[<p><strong>40x performance improvement in this example is series of a few improvements combined.</strong></p>



<p>I was trying to check what can be improved during working on a small project. As usual, I run intellij profiler from time to time to see the hottest methods. And that was part of my development workflow aside from <code>TDD</code>.</p>



<h1 class="wp-block-heading">2x speedup &#8211; remove String.format() usage (technical optimization)</h1>



<p><a href="https://github.com/mgrzaslewicz/set-equalizer/commit/c68b0081335d25782417b95a7e3a78bb1c4239bc" target="_blank" rel="noopener" title="">git commit</a></p>



<p>Remove usage of <code>String.format("Move index %s from %s to %s", indexFrom, listFrom, listTo);</code></p>



<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="579" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_tostring_2022-03-31_11-49_1703190081490_0-1024x579.png" alt="" class="wp-image-81" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_tostring_2022-03-31_11-49_1703190081490_0-1024x579.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_tostring_2022-03-31_11-49_1703190081490_0-300x170.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_tostring_2022-03-31_11-49_1703190081490_0-768x434.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_tostring_2022-03-31_11-49_1703190081490_0.png 1207w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>It&#8217;s not surprising it was slow, but discovering that is not obvious. Imagine you&#8217;re introduced to a new, big project &#8211; there is no way you would eyeball the code and say this is a <code>bottleneck</code> (<code>hot method</code>)</p>



<h2 class="wp-block-heading">10x speedup &#8211; replace streams with for each (technical optimization).</h2>



<p><a href="https://github.com/mgrzaslewicz/set-equalizer/commit/59a124aaf2e33f58695b76b3e5568dc6e27d10b3" target="_blank" rel="noopener" title="">git commit</a></p>



<p>Replace</p>



<p><code>return calculators.stream().mapToInt(c -&gt; c.calculate(listA, listB)).sum();</code></p>



<p>With</p>



<pre class="wp-block-code"><code>  int sum = 0;
  for (var calculator : calculators) {
      sum += calculator.calculate(listA, listB);
  }
  return sum;</code></pre>



<p>Bottleneck (hot method) found</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="807" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_10x_speedup_for_each_2022-03-31_11-49_1703190434993_0-1-1024x807.png" alt="" class="wp-image-84" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_10x_speedup_for_each_2022-03-31_11-49_1703190434993_0-1-1024x807.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_10x_speedup_for_each_2022-03-31_11-49_1703190434993_0-1-300x236.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_10x_speedup_for_each_2022-03-31_11-49_1703190434993_0-1-768x605.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_10x_speedup_for_each_2022-03-31_11-49_1703190434993_0-1.png 1233w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">2x speedup &#8211; add sum caching (technical optimization).</h2>



<p><a href="https://github.com/mgrzaslewicz/set-equalizer/commit/8b3f2f431b23a9d7fa29501f6f010fcac5dc13e6">git commit</a></p>



<p>Replace</p>



<pre class="wp-block-code"><code>  private int sum(List&lt;Integer&gt; list) {
      int result = 0;
      for (var i : list) {
          result += i;
      }
      return result;
      return Math.abs(listA.sum() - listB.sum());
  }</code></pre>



<p>With a sum caching <code>java.util.List</code> decorator</p>



<pre class="wp-block-code"><code>  public class SumCachingList implements SummingList {
      private final List&lt;Integer> decorated;

      public SumCachingList(List&lt;Integer> decorated) {
          this.decorated = decorated;
      }

      private int sum;
      private boolean sumCalculated = false;

      @Override
      public int sum() {
          if (!sumCalculated) {
              calculateSum();
              sumCalculated = true;
          }
          return sum;
      }

      private void calculateSum() {
          for (var i : decorated) {
              sum += i;
          }
      }

      @Override
      public boolean add(Integer integer) {
          sum = sum() + integer;
          return decorated.add(integer);
      }

      @Override
      public boolean remove(Object o) {
          var removed = decorated.remove(o);
          if (removed) {
              sum = sum() - (Integer) o;
          }
          return removed;
      }

     @Override
     public boolean removeIf(Predicate&lt;? super Integer> filter) {
         var anyRemoved = decorated.removeIf(filter);
         if (anyRemoved) {
             sumCalculated = false;
         }
         return anyRemoved;
     }

      // ...
  }</code></pre>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="659" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_flamegraph_2022-03-31_11-49_1703191218063_0-1024x659.png" alt="" class="wp-image-85" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_flamegraph_2022-03-31_11-49_1703191218063_0-1024x659.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_flamegraph_2022-03-31_11-49_1703191218063_0-300x193.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_flamegraph_2022-03-31_11-49_1703191218063_0-768x494.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_flamegraph_2022-03-31_11-49_1703191218063_0.png 1235w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="185" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_methods_2022-03-31_11-49_1703191230949_0-1024x185.png" alt="" class="wp-image-86" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_methods_2022-03-31_11-49_1703191230949_0-1024x185.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_methods_2022-03-31_11-49_1703191230949_0-300x54.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_methods_2022-03-31_11-49_1703191230949_0-768x138.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_methods_2022-03-31_11-49_1703191230949_0-1536x277.png 1536w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_2x_speedup_caching_methods_2022-03-31_11-49_1703191230949_0.png 1575w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>After above optimizations flamegraph looks like that</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="249" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_after_optimisations_2022-03-31_11-49_1703191360299_0-1024x249.png" alt="" class="wp-image-87" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_after_optimisations_2022-03-31_11-49_1703191360299_0-1024x249.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_after_optimisations_2022-03-31_11-49_1703191360299_0-300x73.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_after_optimisations_2022-03-31_11-49_1703191360299_0-768x187.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_after_optimisations_2022-03-31_11-49_1703191360299_0-1536x374.png 1536w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_after_optimisations_2022-03-31_11-49_1703191360299_0.png 1577w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h1 class="wp-block-heading">Big picture optimisation VS technical optimisation</h1>



<p>Ok, now you see that you can discover bottlenecks using async profiler recording. Above example is rather an easy one. For huge systems with tons of legacy code which is hard to maintain and change, it might be really hard.</p>



<p>Even discovering slow parts of big systems is hard, but that&#8217;s a topic for another blog post.</p>



<p>Technical optimizations are often easier to find.</p>



<p><strong>Technical optimization</strong></p>



<p>You don&#8217;t necessarily have to understand what the application is doing. You can read straight from the flamegraph that adding a cache or optimizing loops is going to help.</p>



<p><strong>Big picture optimisation</strong></p>



<p>You need to understand what your application is doing. You can optimize by organizing processing in a different way, e.g. discovering that you don&#8217;t have to fetch and process some data to in order to display particular piece of frontend.</p>



<h1 class="wp-block-heading">Run integration tests with profiler regularly</h1>



<p>That should be part of your development process. Like running integration tests.</p>



<p>In a perfect case, scenario of integration test should be as similar as possible to production scenario. This way you&#8217;d discover potential performance improvements or issues before production deployment.</p>



<h1 class="wp-block-heading">Know the difference between async profiler sampling modes: <code>Wall Clock</code> and <code>CPU</code> usage only</h1>



<h2 class="wp-block-heading">Wall Clock (Total Time)</h2>



<p>If you&#8217;re interested in real latency of your system, including</p>



<p>IO, e.g.</p>



<ul class="wp-block-list">
<li>connection polling</li>



<li>DB transaction locks</li>



<li>reads/writes</li>
</ul>



<p>synchronization, e.g.</p>



<ul class="wp-block-list">
<li>waiting for critical section access</li>



<li>waiting for tasks in a thread pool</li>
</ul>



<p>use <code>Wall Clock</code> sampling in async-profiler. This mode will collect events from threads in <code>SLEEPING</code> state also.</p>



<p>Beware it affects performance of measured process more than measuring only <code>ACTIVE</code> threads. Why? Because measured JVM has more threads to iterate over every interval.</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="576" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_Total_time_2023-12-21_16-11-35_1703186529232_0-1024x576.png" alt="" class="wp-image-89" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_Total_time_2023-12-21_16-11-35_1703186529232_0-1024x576.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_Total_time_2023-12-21_16-11-35_1703186529232_0-300x169.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_Total_time_2023-12-21_16-11-35_1703186529232_0-768x432.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_Total_time_2023-12-21_16-11-35_1703186529232_0-800x450.png 800w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_Total_time_2023-12-21_16-11-35_1703186529232_0.png 1090w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">CPU only</h2>



<p>In other words, without <code>Wall Clock</code> mode, measuring sampling only JVM threads in <code>ACTIVE</code> state</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="573" src="https://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_CPU_Time_2022-03-31_11-49_1703186541469_0-1024x573.png" alt="" class="wp-image-90" srcset="http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_CPU_Time_2022-03-31_11-49_1703186541469_0-1024x573.png 1024w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_CPU_Time_2022-03-31_11-49_1703186541469_0-300x168.png 300w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_CPU_Time_2022-03-31_11-49_1703186541469_0-768x430.png 768w, http://deploymenteveryday.com/wp-content/uploads/2023/12/blog_post_bottleneck_analysis_JFR_CPU_Time_2022-03-31_11-49_1703186541469_0.png 1093w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">In the examples above you can see hot method <code>httpClient.newCall(request).execute()</code></h3>



<ul class="wp-block-list">
<li>Time spent measured with <code>CPU Time</code> is 120 ms</li>



<li>Time spent measured with <code>Wall Clock</code> (<code>Total Time</code>) is 9170 ms</li>
</ul>



<p>Would you optimize CPU time in this case? Probably not. You&#8217;d first focus on IO as it&#8217;s the bottleneck.</p>



<p class="has-luminous-vivid-amber-background-color has-background"><strong>Subject to optimize needs to be chosen case by case. In order to have this choice, you need have both <code>CPU Time</code> and <code>Total Time</code> in flamegraph &#8211; and it&#8217;s available only when using wall clock mode sampling</strong>.</p>



<h1 class="wp-block-heading">Resources</h1>



<ul class="wp-block-list">
<li><em><a href="https://plv.colorado.edu/papers/mytkowicz-pldi10.pdf" title="Evaluating the Accuracy of Java Profilers">Evaluating the Accuracy of Java Profilers</a></em></li>



<li>Safepoint bias problem
<ul class="wp-block-list">
<li><a href="https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html">https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html</a></li>



<li><a href="https://psy-lob-saw.blogspot.com/2015/12/safepoints.html">https://psy-lob-saw.blogspot.com/2015/12/safepoints.html</a></li>
</ul>
</li>



<li><a href="https://github.com/mgrzaslewicz/set-equalizer" target="_blank" rel="noopener" title="">https://github.com/mgrzaslewicz/set-equalizer</a></li>
</ul><p>The post <a href="http://deploymenteveryday.com/example-of-40x-speedup-using-async-profiler-recording-analysis-jfr-file/">Example of 40x speedup using async-profiler recording analysis (JFR file)</a> first appeared on <a href="http://deploymenteveryday.com">Deployment Every Day</a>.</p>]]></content:encoded>
					
					<wfw:commentRss>http://deploymenteveryday.com/example-of-40x-speedup-using-async-profiler-recording-analysis-jfr-file/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
