Re: Inconsistent Zero copy performance

July 8, 2013, 6:27 am

≫ Next: calling clBuildProgram in 2 applications at the same time

≪ Previous: Inconsistent Zero copy performance

I forgot to mention that I am using Windows 7 Home Premium (64-bit) and Catalyst 13.4.

↧

calling clBuildProgram in 2 applications at the same time

July 8, 2013, 6:33 am

≫ Next: CodeXL 1.1 no support for OpenGL 4.3 and compute shader?

≪ Previous: Re: Inconsistent Zero copy performance

Hi,

We have 2 applications. If those application call "clBuildProgram" at the same time, one is building a kernel for GPU and the other is building another kernel for CPU, my screen freeze and after some seconds I got an error and my video driver dies.

Is it intended to not supporting 2 "clBuildProgram" in 2 applications (".exe") at the same time?

Thanks

↧

CodeXL 1.1 no support for OpenGL 4.3 and compute shader?

July 8, 2013, 1:31 pm

≫ Next: Semaphore handle leak when host threads are destroyed on Windows

≪ Previous: calling clBuildProgram in 2 applications at the same time

I moved from OpenGL/OpenCl setup to a OpenGL 4.3 setup with compute shader. I use CodeXL 1.1.2885.0 on Windows 7 x86_64 with an GeForce GTS 450. And my compute shading program object has a program link error...

Compute info
------------
(0) : error C7006: no work group size specified

Run without CodeXL works good. I think that the debugger support only OpenGL 3.2 at the moment, but is it possible to debug the 4.3 context code with compute shader?

↧

Semaphore handle leak when host threads are destroyed on Windows

July 8, 2013, 5:55 pm

≫ Next: Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

≪ Previous: CodeXL 1.1 no support for OpenGL 4.3 and compute shader?

If one creates a context in the application main thread and then calls OpenCL functions utilizing this context from other threads (such as memory allocation/deallocation), OpenCL runtime creates 3 semaphores for each thread but not frees them when this thread is destroyed.

Attached is a sample code illustrating the problem.

Instructions -- on a machine with an AMD GPU (we tested on HD 7970) running Windows 7 x64:

1. Compile the application

2. Run Windows Task Manager

3. Make sure that the Handles column is shown (View > Select Columns..)

Note: it seems that Task Manager on Windows 8 does not have this menu, use other monitoring tools on that OS.

4. Run the application. Hit ENTER several times. Note that every time ENTER is pressed, the number of Handles consumed is increased by 3.

5. Use Windbg from Windows SDK to find out that the leaking handles represent semaphores and inspect the call stack within amdocl[64].dll

↧

Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

July 8, 2013, 6:01 pm

≫ Next: Re: OpenCL Maximum Buffer Size of Kernel Argument

≪ Previous: Semaphore handle leak when host threads are destroyed on Windows

Is anything going to happen with this bug report?

↧

Re: OpenCL Maximum Buffer Size of Kernel Argument

July 8, 2013, 10:51 pm

≫ Next: Re: Inconsistent Zero copy performance

≪ Previous: Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

Thanks void_ptr for clarifying. I took the kernel argument size for a different meaning.

Hi stuart,

Thanks for confirming the issue is justified now.

↧

Re: Inconsistent Zero copy performance

July 8, 2013, 10:56 pm

≫ Next: Re: calling clBuildProgram in 2 applications at the same time

≪ Previous: Re: OpenCL Maximum Buffer Size of Kernel Argument

We tried running the sample here on Win7, using 13.6 beta, Trinity APU and we do not see, too much variations. Can you try with the 13.6 beta driver? If you still see the inconsistencies, please forward the logs to us.

↧

Re: calling clBuildProgram in 2 applications at the same time

July 8, 2013, 11:20 pm

≫ Next: Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

≪ Previous: Re: Inconsistent Zero copy performance

Is the issues reproducible using APP SDK Samples too? Can you give some steps on how you ran the two applications at the same time, to the issue can be reproduced.

↧

Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

July 8, 2013, 11:54 pm

≫ Next: Re: Doubts about the progress of AMD Catalyst Linux?

≪ Previous: Re: calling clBuildProgram in 2 applications at the same time

It has been reported. We will add a relevant note in documentation, if it is indeed found necessary. Thanks for reporting.

↧

Re: Doubts about the progress of AMD Catalyst Linux?

July 9, 2013, 12:01 am

≫ Next: Re: Semaphore handle leak when host threads are destroyed on Windows

≪ Previous: Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

Thank you for your feedback.

Regarding blender support, AMD engineers are working on it. Please check http://devgurus.amd.com/message/1285984#1285984

↧

Re: Semaphore handle leak when host threads are destroyed on Windows

July 9, 2013, 12:04 am

≫ Next: Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

≪ Previous: Re: Doubts about the progress of AMD Catalyst Linux?

Thanks for reporting it. I will let you know my findings.

↧

Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

July 9, 2013, 12:08 am

≫ Next: Re: Re: More pinned host-memory than device-memory capacity

≪ Previous: Re: Semaphore handle leak when host threads are destroyed on Windows

Thanks Himanshu.

↧

Re: Re: More pinned host-memory than device-memory capacity

July 9, 2013, 11:48 pm

≫ Next: Re: Re: More pinned host-memory than device-memory capacity

≪ Previous: Re: clBuildProgram crashes on 13.4 driver on a binary built with 13.6 beta driver

hi ,

I recentely wrote a test to overmap gpu memory, and it seems more buffers can be allocated to GPU than the GPU memory, by probably swapping the buffers already inside GPU memory. Maybe you can try the code on your setup.

↧

Re: Re: More pinned host-memory than device-memory capacity

July 10, 2013, 1:12 am

≫ Next: [Suggestion] On nVidia's LightBoost Technology

≪ Previous: Re: Re: More pinned host-memory than device-memory capacity

So I tried this OverMappingBuffers on both NVIDIA and AMD. When the number of buffers is such that the device memory is overcommitted, NVIDIA fails with -4 relatively early (I suspect they don't do delayed allocation, so they realize they're out of GPU memory quite soon). On AMD, it starts chugging along, but then it fails with -5. This is running on a Cayman (HD6970) with 13.4 drivers. I really think buffers are never evicted from device memory in AMD's platform, until they are released.

↧

[Suggestion] On nVidia's LightBoost Technology

July 10, 2013, 1:33 am

≫ Next: Re: clAmdFft - Multi-Device enqueueTransform Failure

≪ Previous: Re: Re: More pinned host-memory than device-memory capacity

I was notified by your customer service staff that this was the best way to get in touch with AMD's Marketing and Engineering departments. I have what I feel are some worthwhile suggestions to improve existing products.

Let me preface this by saying that I have been a huge fan of AMD and ATI from the day I built my first PC (a K5 with a 3D Rage Pro).

Recently there has been a growing movement among enthusiasts and gamers to adopt nVidia GPUs solely to take advantage of LightBoost, a licensed strobing backlight technology. While normally used in conjunction with nVidia's active shutter 3D system, a hack can enable LightBoost without 3D enabled. Utilizing this strobing backlight with high refresh rates produces fluid motion on par with CRT monitors.

More information can be found here:

http://www.blurbusters.com/zero-motion-blur/lightboost/

BlurBusters.com is a site dedicated to reducing motion blur on LCDs and is maintained by Mark Rejhon. He has spent quite a bit of time evangelizing LightBoost to enthusiasts in various tech communities. I have to say that he has been quite persuasive and has even convinced me, an AMD fanboy, to switch to nVidia for my next graphics card just to take advantage of this technology. Of course, I don't want to do that. Luckily there is one working solution for AMD cards to have LightBoost enabled, but it requires a very ugly method involving hot-swapping a monitor from an nVidia card to an AMD card.

But this also makes it very clear that the monitor handles things without needing a persistent connection to the nVidia GPU. In fact, it seems the GPU simply sends a command to the monitor to turn LightBoost on or off. Some have speculated that this might be a simple DDC command.

And this is why I am eager to contact AMDs fine engineering staff. I think it would be quite beneficial to everyone if AMD added a simple option to enable LightBoost on monitors that support it. While this may aid nVidia in some respect due to monitor sales, you would at least eliminate one major advantage to owning an nVidia GPU.

If making your graphics cards compatible with an nVidia technology doesn't interest you, perhaps creating a competing technology would. It seems nVidia still hasn't caught wind of this trend because they have yet to offer an easy solution to enable LightBoost without 3D. You could be the first to market a solution geared towards fluid motion by utilizing the same strobing backlight technology. Plenty of enthusiasts and gamers would prefer having less motion blur on a high-response TN display compared to the higher color accuracy of an IPS. I am sure there is a market for it.

Anyway, I hope this was a bit helpful. I'd really appreciate feedback if this gets read.

↧

Re: clAmdFft - Multi-Device enqueueTransform Failure

July 10, 2013, 3:01 am

≫ Next: Which is the right space to ask about GPU assembly

≪ Previous: [Suggestion] On nVidia's LightBoost Technology

I cannot make it work either following the instructions from the clAmdFFT manual:

Currently, multi-device operation must be managed by the user. OpenCL contexts can be created that are associated with multiple devices, but clAmdFft only uses a single device from that context to transform the data. Multi-device operation can be managed by the user by creating multiple contexts, where each context contains a different device, and the user is responsible for scheduling and partitioning the work across multiple devices and contexts.

I get a slightly different error though:

OPENCL_V< CLFFT_INVALID_CONTEXT > (1201): clEnqueueNDRangeKernel failed

Single GPU operations work fine and the context is not invalid, points to a single GPU, has its own command queue, etc...

I have found a temporary solution to what is essentially a show stopper by running the program 4 times, using a single different GPU for each instantiation.

↧

Which is the right space to ask about GPU assembly

July 10, 2013, 3:57 am

≫ Next: Re: What's wrong with this file?

≪ Previous: Re: clAmdFft - Multi-Device enqueueTransform Failure

Is it this, GPU Developer Tools or Graphics Programming or which?

↧

Re: What's wrong with this file?

July 10, 2013, 4:33 am

≫ Next: Re: Which is the right space to ask about GPU assembly

≪ Previous: Which is the right space to ask about GPU assembly

No news again? waiting become too long without bit of news.yes i've get smallux work for me but it'is only me, and this render engine is features limited compared to Vray octane cycles and all this soft where artist want acceleration with AMD cards.some of us get completely discourage look at this ( Amd/ati Opencl+ Blenderheads+blender Cycles=harmony | Facebook ). is it so hard to solve this issues ?

Now AMD is speaking about HSA for APUs and next HD 9000 volcanics island does it mean that radeon 7XXX series will never get cycles vray octane work ?

why AMD why why ? we need true answer ?

↧

Re: Which is the right space to ask about GPU assembly

July 10, 2013, 5:21 am

≫ Next: Re: clEnqueueWriteBuffer for part of array

≪ Previous: Re: What's wrong with this file?

Hi,

I think the OpenCL space is the right one.

↧

Re: clEnqueueWriteBuffer for part of array

July 10, 2013, 6:24 am

≫ Next: Re: FPS drops in L4D2 and other source games

≪ Previous: Re: Which is the right space to ask about GPU assembly

Meteorhead sorry if I was ambiguous.

Himanshu Gautam I have a 1-D array (vector) and you are correct in understanding the question. I am looking at subBuffer and think that it works perfectly for my application. Thanks.

↧

Latest Images