So I need to write a script that can parse our logs and give me the amount of daily activity per user on our website. Unfortunately I'm still learning the very basics so please bear with me . Below is an example snippet from a log to give you a basic idea of what each entry in the log basically looks like (the important parts I want extracted bolded and are basically just the date and username):
So when somebody is on our site and performs activity, clicking through different pages, etc. an entry like above is written to the log for each bit of activity. In each log there can be several different days depending upon the activity (logs rotate based on size).
So far I've got this:
This gives me a list of two colums, with the number of instances (and hence user activity) paired with the username. Now I need to associate these with the date so that for any given day it will output the username and activity and day/date, and output that to .csv file. I'm open to any method really, I "think" it shouldn't be too difficult to modify what I have already but then again I'm new to this and not really sure how to do it right
Last edited by Don Cragun; 05-27-2015 at 01:30 AM..
Reason: Get rid of FONT, COLOR, and SIZE tags; add CODE tags.
Assuming that you want the output sorted by increasing alphanumeric username as the primary sort key and increasing date as the secondary sort key, the following seems to work (assuming your one-line sample is representative of the actual format of your data) without the need for two awk scripts and without the need for uniq:
This User Gave Thanks to Don Cragun For This Post:
Thanks, yes I would like the output sorted by increasing alphanumeric username as primary sort key, and increasing data & time as the secondary (I didn't find out until today that they want the time in addition to the date but it appears right after the date in the logs).
Unfortunately the script you wrote didn't quite work for my actual logs, however that might be because of the snippet I included. I should've included a better sample from the logs, showing that the date/time line up on the same space each entry so that might make it easier?
I don't know if maybe there is an easier way to do it, or whether awk is the way to go, so if anybody has a better suggestion I'm certainly open to it.
---------- Post updated at 11:37 AM ---------- Previous update was at 10:44 AM ----------
Also, each entry actually is four lines long when you cat the log, rather than just a single line as it appears when I paste it into here.
Try an adaption of Don Cragun's fine proposal:
As there is no sample of your four-line-log-entries, I can't help and you'll need to experiment with three getlines (including error handling) to compose a $0 that you can work upon with above.
Yes. When counting fields in awk, there is a huge difference between:
and:
When you want help from a computer scientist, details about input file format are crucial!
I understood the reason for counting the number of log entries per day for a given user. But, I must be missing the point of counting the number of log entries related to a user on the same date and time. Do you really have multiple log entries for a given user being created in the same millisecond?
Are you really saying that each of the lines shown in your latest sample contains four newline characters? Or (making lots of wild assumptions) are you saying that four lines on your screen are used for each line when you cat the log because each line is somewhere between 360 and 480 characters long (assuming a 120 character line length on your screen) and your terminal is wrapping the output onto 4 screen lines? (Your two sample lines have 430 and 447 characters, respectively, including the terminating newline characters.) Show us the output from the command:
or, if that command fails saying that the -n option isn't recognized:
so we have a better chance of understanding what your log file format.
This User Gave Thanks to Don Cragun For This Post:
When you want help from a computer scientist, details about input file format are crucial!
Sorry about that, after your initial response I realized what I omitted what I thought was useless data was in fact important data for this task
Quote:
I understood the reason for counting the number of log entries per day for a given user. But, I must be missing the point of counting the number of log entries related to a user on the same date and time.
Sorry for the confusion on this too, the request was sent to me in an email from a person who doesn't understand how we log information and wasn't particularly clear - he's trying to gather info on our customer patterns when using our site. Now that I stop and think about it activity per customer per day should be sufficient. Later on they may want timestamps to see if there are times of day where customers are more active, but I'm not worried about that for now.
Quote:
Or (making lots of wild assumptions) are you saying that four lines on your screen are used for each line when you cat the log because each line is somewhere between 360 and 480 characters long (assuming a 120 character line length on your screen) and your terminal is wrapping the output onto 4 screen lines?
You are correct, if I do a
on a sample file containing a complete entry it is in fact a single line.
So, with the new details about your input file format (and using some of RudiC's suggestions to optimize string handling), does:
do what you're trying to do?
This User Gave Thanks to Don Cragun For This Post:
hi all,
I am writing a script and beginner in shell scripting. I have tried the below script. could you please check and let me know whether the below scirpt is correct.
Unix details : HP Unix
Input file.
cat input.txt | tail -4
HTS40002.W1978.PROM
HTS40003.W1978.PROM... (17 Replies)
Preview of command prompt
f ---> to start ferret
q----> to stop ferret
asp@nex:~$ f
NOAA/PMEL TMAP
FERRET v6.82
Linux 2.6.18-308.8.2.el5PAE 32-bit - 08/03/12
3-Dec-12 16:44
yes? go my.jnl
yes?column=4/skip=1/type=num,text ............filename.txt
---... (4 Replies)
folks;
I have a script to remove any files that older than 14 days then move any files that younger than 7 days to another directory. but for some reason it doesn't move the files, when i do it manually it works but not through the script. i tried 2 different ways in writing the move part but it... (6 Replies)
Folks;
I'm writing a shell script to extract some fields out of a log file & it will run periodically, how can i make it runs starting from where it left of. for example;
if the script will do the extract every 2 days, let's say the first run will extract fields until July 25, 2007 @ 11:15:22... (1 Reply)
I'm writing a small script that will run an executable program (sort of like TOP). To exit the executable, you have to enter control C (^c). I'm trying to use a redirect input file to send the ^c but I'm not having any luck. My short script looks like this - /mydirectory/abc.script < abc.in >... (1 Reply)
I am writing a backup script for AIX 5 and running into a problem where the output isn't being shown in the output log that is being created. Any ideas on how this would be corrected? I have included the script below. The only thing showing up in the file is listed below. I was hoping to capture... (2 Replies)
I'm new to shell scripting and am having a problem trying to do something in C shell. I want to write a script that will input something instead of a user doing it. For example, using the command 'write' the user is supposed to type something to be sent to another user. I want a script to be able... (3 Replies)
Hello
I am working on cleaning up permissions on Oracle mountpoints and datafiles in unix. I am looking for a script or a scripting idea to 1st.
1. grep for owner oracle
2. ensure its a directory owned for oracle
3. chmod 750 on the oracle owned directory.
4. grep for oracle files, etc... (3 Replies)
This script searches for core files and if it finds one, it emails me to let me know.I DONT want it to email me if it doesn't find one but I can't figure out what I need to change or add. Any thoughts? Script below:
/bin/find / -name core -type f -ls -exec file {} \;|/usr/bin/mailx -s... (1 Reply)
I am writing a script that will identify the oldest file in a directory. Here's the syntax:
#!/bin/ksh
cd directory
chmod 777 *
ls -r -1t > file1
sed -n -e "1P" < file1 > file2
So my problem is, now I have file2, which contains the name of the oldest file in the directory. How do I use,... (1 Reply)