Redirect ETW logging to Syslog on Linux#5144
Redirect ETW logging to Syslog on Linux#5144daxian-dbw merged 21 commits intoPowerShell:masterfrom dantraMSFT:syslog
Conversation
|
Add "Blocked" to wait #5149 |
PaulHigin
left a comment
There was a problem hiding this comment.
I am not through with my review, but here are my comments so far.
There was a problem hiding this comment.
GetSysLogIdentity() or GetSysLogId()?
There was a problem hiding this comment.
It looks like this same path creation is done repeatedly in other methods so it can be done just once in the constructor and assigned to a readonly field. Also psHomeConfigDirectory should be readonly since it is set just once in the constructor.
There was a problem hiding this comment.
The entire class uses this pattern and; while i was tempted to change it; I think it's beyond the scope of this PR. I will mark the two fields as read-only, though.
There was a problem hiding this comment.
Minor comment. You can also do:
identity.Equals(LogDefaultValueName, System.StringComparison.OrdinalIgnoreCase) which returns a Boolean.
There was a problem hiding this comment.
Personal preference - I prefer having the two strings adjacent when reading the code.
There was a problem hiding this comment.
Prefer consistency over personal preference for others.
Personally, I pause on Compare and not on Equals, so it is a distraction.
There was a problem hiding this comment.
Well, that's two so I'll change it.
There was a problem hiding this comment.
We should be able to make this field be the base class, LogProvider, and remove these ifdefs.
There was a problem hiding this comment.
Sure, that would work but I think I'll defer that to #5180
There was a problem hiding this comment.
This is also defined in PropertyAccessor.cs. We should only define it once.
There was a problem hiding this comment.
Fixed - now only defined in PropertyAccessor
There was a problem hiding this comment.
Instead of creating a new StringBuilder instance for each method call, I think we could just have a class property _payload instance that is cleared each time.
There was a problem hiding this comment.
I'm pretty sure that will introduce race conditions.
There was a problem hiding this comment.
Not if the instance is thread local.
There was a problem hiding this comment.
I suspect this is more speculation than anything else. Consider that the string builder has to be cleared and realloc'd each time plus the TLS overhead, it's likely premature to optimize without measurement. Of course, if you have reasonable evidence that object allocation will be a significant part of the overhead; I'll move it to thread local.
There was a problem hiding this comment.
Garbage collection is not free - but you pay for it at random times, so it's hard to quantify.
The CLR thinks StringBuilder reuse is worthwhile, see here.
There was a problem hiding this comment.
It looks like a most of these LogProvider methods have a lot of commonality with the Windows implementation (PSEtwLogProvider) and they can be implemented in the base class, with WriteEvent being specific to the derived class. I know each class is complied separately for different platforms but it would make maintaining the code easier having most of the source in one place.
There was a problem hiding this comment.
This entire ETW logging stack needs to be refactored and after discussing it with Steve, I decided to defer.
There was a problem hiding this comment.
Ok. Can you please create an Issue to track the work?
There was a problem hiding this comment.
I'll move moved the common string building functions down into the base LogProvider class to reduce duplication.
There was a problem hiding this comment.
Shouldn't this field also be readonly?
|
Something is wrong. A lot merged changes are shown up in this PR. Can you please rebase? |
There was a problem hiding this comment.
Please add comment block for this constructor.
There was a problem hiding this comment.
This needs to have a Marshal.FreeHGlobal() to free up the unmanaged allocation. But you can use the [MarshalAs(UnmanagedType.LPStr)] attribute on the NativeMethods.OpenLog() parameter, to get automatic marshaling.
There was a problem hiding this comment.
No, this string needs to remain for the lifetime of the process. I added a comment.
There was a problem hiding this comment.
I don't think this is necessary. You can initialize _resourceManager field where it is defined or in the static constructor. Also the field can be readonly.
There was a problem hiding this comment.
This is the pattern used everywhere else. i believe the notion is to defer loading resources until first use.
There was a problem hiding this comment.
Since Native_OpenLog takes a const char* it shouldn't need special marshaling. But I assume you do this above because it keeps a reference to the ident parameter string? If so can you update the comment to make this clear and as to why FreeHGlobal is never called (presumably because the allocation needs to exist for the life of the process).
There was a problem hiding this comment.
I assume this is copied directly from Windows PowerShell manifest. I see that it has "PSWorkflow" keywords. Since we don't support workflows they can be removed at some point.
There was a problem hiding this comment.
Yes, we'll need to scrub all entries that are no longer referenced.
There was a problem hiding this comment.
Please do the scrub before this is merged - no sense in taking something that never should have been there in the first place.
There was a problem hiding this comment.
I'll do the scrub as part of the Windows manifest work; due next.
|
@PaulHigin I believe I've address all of your feedback. |
There was a problem hiding this comment.
As long as this is generated, maybe generate a switch statement instead of a dictionary.
Switch dispatch can be very fast (jump table) or a quick binary search, and the other benefit is we won't allocate a bunch of objects that might never be used.
There was a problem hiding this comment.
Quoted from "https://stackoverflow.com/questions/44905/c-sharp-switch-statement-limitations-why/48259#48259"
At the end of the day, a C# switch against an integer expression on a modern system is a sub-microsecond operation, and not normally worth worrying about.
There was a problem hiding this comment.
Fair enough, I'll regenerate the code to use an out parameter for the parameter count and avoid the object allocation.
There was a problem hiding this comment.
The formatting differs significantly, please follow the style of the rest of the code base. Specifically, '(', '||', and ')' should not be on their own line.
There was a problem hiding this comment.
This could go in System.Management.Automation.Utils.Separators - it helps reuse to have a common place arrays like this.
There was a problem hiding this comment.
I don't see any inherit benefit in sharing this constant. It simply introduces unnecessary/unintended coupling. ON the other hand, if there's an existing pattern that this should follow; I'm happy to look at it.
There was a problem hiding this comment.
It's more of a meta benefit - folks create arrays like this all over the place, sometimes in a loop.
People copy patterns, so if you follow the pattern, others will too. If you don't, they might follow your pattern - and in the new use case, it might be a bad choice.
There was a problem hiding this comment.
I've left it encapsulated for now; I consider it a private implementation detail until we have a use case for sharing it.
tools/ResxGen/ResxGen.psm1
Outdated
There was a problem hiding this comment.
$[EventMessage] looks like a bug.
tools/ResxGen/ResxGen.psm1
Outdated
There was a problem hiding this comment.
Calling Contains isn't really necessary - Replace won't create a new string if there's nothing to replace, and Replace has to search the string again, so it just does more work.
There was a problem hiding this comment.
And I think this whole loop would be easier to read/understand if it was something like this (basically inline the replacements since they aren't used elsewhere.)
$message = $message -replace '%t', "`t" `
-replace '%n', "`n" `
-replace '%r', "`r" `
-replace '%space', ' ' `
-replace '%.', '.' `
-replace '%%', "%"
tools/ResxGen/ResxGen.psm1
Outdated
There was a problem hiding this comment.
We've moved to UTF8 with no BOM in this repo.
There was a problem hiding this comment.
What does that mean in practice?
There was a problem hiding this comment.
It's probably fine to just use ASCII when generating this file.
There was a problem hiding this comment.
Are you referring to the code or resx file? ASCII would preclude localization of the resx file; wouldn't it?
There was a problem hiding this comment.
That's just a shortcut assuming that we don't have characters requiring UTF8 encoding, which we don't.
This would only preclude localization if we generated the resx file from a localized manifest. As we're not even localizing error messages, I think we're a long way off from localizing the event log, and when that happens, it might not depend on generating new resx files - so I wouldn't worry about that for now.
There was a problem hiding this comment.
Fair enough, I'll switch to ascii.
There was a problem hiding this comment.
Please do the scrub before this is merged - no sense in taking something that never should have been there in the first place.
There was a problem hiding this comment.
This description does not describe the guid in the example.
There was a problem hiding this comment.
FYI - storing the hash code isn't necessary, it's highly optimized - stored in the string so it's only computed once. It's probably slower to store it yourself (not verified though).
There was a problem hiding this comment.
Where are you seeing this implementation? When I review GetHashCode in the coreclr code base, I don't see any caching of the hashcode.
There was a problem hiding this comment.
It's not exposed via managed code - it's a low level implementation detail because string needs to be highly optimized.
There was a problem hiding this comment.
Does it also optimize away the function call?
There was a problem hiding this comment.
After reading about String.GetHashCode, I'm going to leave the implementation as is. There's a good argument for generating a hash using an algorithm that is repeatable and less likely to collide. As it stands, it's not hard to get two hash codes for the same commit id.
I'll add a comment to that effect.
There was a problem hiding this comment.
Why do we need the hash code for git commit id? Is it used as some sort of an identification for a logging?
There was a problem hiding this comment.
The hash code is used in-lieu of logging the long commit id with every log message. Instead, I log an initial entry containing the full commit id and the associated hash. Subsequent entries contain on the hash. SysLogProvider has a comment block at the beginning that illustrates its usage.
There was a problem hiding this comment.
I suppose you can ignore my comment about caching - it's not clear to me that caching is really happening.
If you feel like digging into details, you could start here.
There was a problem hiding this comment.
Why keep another reference here? It's not changing in PSVersionInfo.
There was a problem hiding this comment.
You're right, I'll remove it.
There was a problem hiding this comment.
We avoid unnecessary lines in this code base, this should be 1 or 2 lines instead of 6.
There was a problem hiding this comment.
Formatting - ( etc. should not be on a line by itself.
There was a problem hiding this comment.
We could reuse a single instance of StringBuilder to avoid extra allocations.
There was a problem hiding this comment.
Not if the instance is thread local.
There was a problem hiding this comment.
With so many appends, it feels like this StringBuilder should preallocate space to avoid unnecessary reallocations - just guess or take a typical result and just use that.
There was a problem hiding this comment.
Good point, I'll do an estimate and use it.
|
NOTE: This PR is dependent on #5310 as well as republishing the nugget package for libpsl-native. |
|
#5310 has been merged. Need to update the libpsl nuget package to include the binary with the new changes. |
|
The new libpsl nuget package has been published to powershell-core feed. |
Explicity test for the manifest file.
…onditional compilation. Update powershell provider guid references
Update to enable future support for customizing the syslog ident and facility.
Add flags to PSChannel and PSKeyword for reliable parsing
…nfig value. Experiment with Local0 facility.
* ResxGen - change code generator to use a switch statement in GetMessage instead of a dictionary of objects * Use thread local shared StringBuilder in SysLogProvider and PSSysLogProvider. * Moved common string building methods down into LogProvider from the derived PSEtwLogProvider and PSSyslogProvider classes.
…ventIdResourceName
|
@daxian-dbw I've verified the package on Ubuntu 14.04, Rasberry PI, and Mac 10.12. My 16.04 system is currently hosed. |
NOTE: Tests are pending...
This PR is divided into the following areas:
1: Add/Rename the PowerShell/Windows ETW manifest to the repo and change both the provider id (guid) and name. See #4939.
The manifest is at tools/resxgen/PowerShell-Core-Instrumentation.man
2: Generate a resx file containing the string resources needed for logging ETW events to syslog.
This is accomplished by tools\resxgen\resxgen.psm1 and resxgen.ps1. The tool generates two files
NOTE: The EventResource.cs class generated by resgen is explicitly ignored in the csproj file; it is not used.
3: Native shims for the syslog apis to enable pInvoke.
The shim implementations are libpsl-native\src\nativesyslog.cpp. Native_SysLog, Native_OpenLog, and Native_CloseLog are the target pInvoke.
NOTE: See #5149.
4: SMA\utils\tracing\PSSysLogProvider.cs
Implements the abstract LogProvider class and is the syslog equivalent of PSEtwLogProvider. The class contains a number of logical methods for logging lifecycle, health, and normal events.
5: SMA\utils\tracing\SysLogProvider.cs
This is the Syslog equivalent of ETW's log provider class and implements a Log method versus ETW's EventWrite. It is also responsible for resolving event ids to resource names and peforming the Syslog call. There is a large comment block in the class XML doc that describes the types of log output it produces.
6: SMA\utils\tracing\PSEtwLog.cs
PowerShell's current implementation is tightly coupled to this class; with code calling it directly for all events. To simplify integration of syslog, I updated the class to create an instance of PSSysLogProvider on Linux and removed the Linux-specific stub file.
7: SMA\engine\PropertyAccessor.cs
This class provides a wrapper around PowerShellProperties.json and has been extended to have Unix-specific assessors for configuring logging. Note that the file is expected to be in the $PSHOME directory to ensure SxS.
Currently, there are four configuration properties:
LogIdentity - the string identifier for the source application. This defaults to 'powershell' and can be configured to enable distinguishing between side-by-side installations.
LogLevel - configures the tracing level (log level). Informational is the defauilt.
LogChannels - used to enable operational and analytic logging. Operational is the default.
LogKeywords - used to configure enabling tracing by keyword. All keywords other than UseAlwaysAnalytic are enabled by default.
NOTE: This will likely change. PowerShell sometimes confuses the analytic channel with this keyword and sends logging to the wrong channel. Once this is cleared up, UseAlwaysAnalytic and UseAlwaysOperational keywords will likely be removed.
Additional Notes:
1: The current implementation writes directly to syslog and writing to a separate log file is still pending.
2: The generated class and resx are not part of the build; instead, it is expected that Resxgen should be run when events are added to the manifest. To fully automate the process, resxgen will need to be updated to generate the other dependent enums such as PSChannel, PSEventId, PSTask, PSOpcode, etc. You will see parsing logic in resxgen.psm1 to prepare for that but it is not enabled at this point.
4: Documentation is pending that documents the format of the syslog output as well as associated configuration.
5: As mentioned at the start, tests are pending