03-02-2025 10:01 AM
When I got the GPU new (which was like in January this year), the hotspot temp never exceeded 85c and the chip temp seldom exceeded 65c, and the fans never needed to spin at >1700rpm RPM.
But ever since I updated to the newest drivers since the 50 series release, my hotspot temp has been getting up to a very concerning 98-99c and the chip temp is exceeding 70c even with the fan at max 2000 RPM.
I'm assuming this is caused by the new Nvidia drivers and not degraded thermal paste since my GPU is only 2 months old and it happened all of the sudden the temp didn't creep up gradually over time. Is anyone else experiencing this and know how to resolve this?
03-13-2025 01:10 PM
I have Asus TUF RTX 4080 Super it's 3 month card, before Hot Spot temp are ~82C max at full load, now hits ~95C. Probably after new drivers to 50xx series I think also. I do nothing with card.
03-13-2025 06:59 PM
Btw, I found out that it's due to a thermal paste pump out, not a driver ussue. Basically, the factory did a bad job at applying the thermal paste which caused the thermal paste to be "squeezed out" when the die and cold plate expanded unevenly under load.
A repaste fixed it.
03-13-2025 01:20 PM - edited 03-13-2025 01:21 PM
There's a reason NVIDIA opted to not to expose the hotspot sensor on the 5000 series (in my opinion). These are huge dies, and consistent contact with the cold plate can be difficult and become worse over time. I think users tend to look at it too often and become alarmed too quickly.
Remember the hotspot is the hottest reading of many sensors on the die. Ideally, you want it below 85c. If it's above this, may benefit from repasting the thermal application.
03-13-2025 01:28 PM
Then all the more reason to show hot spot temperature instead of hiding it? So that we know there's a problem and can fix it?
03-13-2025 01:43 PM
That comes down to whether it's a "problem."GPUs have been around a lot longer than that kind of telemetry, and they'll continue to be around long after. Sure, it's nice to have - but I've witnessed the bewilderment and concern it can cause first-hand. A 10-15c delta can be expected in most cases.
03-13-2025 02:05 PM
Yes, but we have never had GPUs drawing nearly 600w of power till now. If hotspot temperature is not reported, I would have never know my hot spot was reaching >105, or that theres a >30c temp delta. This will makes it harder to diagnose issues or worse, lead to overheating issues going undiscovered leading to irreparable damage.
03-13-2025 10:45 PM - edited 03-13-2025 10:47 PM
@iamdot wrote:
Yes, but we have never had GPUs drawing nearly 600w of power till now.
We certainly have 😁
@iamdot wrote:
If hotspot temperature is not reported, I would have never know my hot spot was reaching >105, or that theres a >30c temp delta. This will makes it harder to diagnose issues or worse, lead to overheating issues going undiscovered leading to irreparable damage.
Never heard of this happening, because there are junction points in place. Whilst I too would prefer more telemetry than less, I think the reason for removing it is opposite to what you think may happen, which is people RMA'ing unnecessarily. It's not uncommon for the hottest reading taken over multiple sensors to be 20c higher.
03-14-2025 05:27 AM
I'm not talking about data center or enterprise GPUs. Those are housed in temp controlled server rooms and overheating is not that big an issue. For consumer level stuff, 600w is a lot.
Brand new, my GPU hot spot max of 85. After barely 3 months, hot spot maxed to 105, causing lag spikes in gaming as the GPU thermal throttles. Would you call that RMA'ing unnecessarily? The GPU is performing worse than it should because of defects, so I am rightly within my rights to RMA it.
I've also seen other posts where people have complained about this issue so it seems it's a pretty common issue with Asus GPUs. And your response to them was that the thermal limit is 110. But you are not getting the point. The hotspot max does not exceed 110 because its thermal throttling! The GPU performance is not what it should be. That certainly is a legitimate RMA reason if you ask me.