Opt out of global data surveillance programs like PRISM, XKeyscore andTempora. https://prism-break.org
Loads of resources for many platforms. Worth the bookmark !
Opt out of global data surveillance programs like PRISM, XKeyscore andTempora.
MEMEX: a good idea, but it mostly depends on who will use it, and how...PART II: FULL TEXT OF ANNOUNCEMENT
I. FUNDING OPPORTUNITY DESCRIPTION
The Defense Advanced Research Projects Agency (DARPA) is soliciting proposals for innovative research to maintain technological superiority in the area of content indexing and web search on the Internet. Proposed research should investigate approaches that enable revolutionary advances in science, devices, or systems. Specifically excluded is research that primarily results in evolutionary improvements to the existing state of practice.(...)
Overview
Today's web search is limited by a one-size-fits-all approach offered by web-scale commercial providers. They provide a centralized search, which has limitations in the scope of what gets indexed and the richness of available details. For example, common practice misses information in the deep web and ignores shared content across pages. Today's largely manual search process does not save sessions or allow sharing, requires nearly exact input with one at a time entry, and doesn't organize or aggregate results beyond a list of links.
The Memex program envisions a new paradigm, where one can quickly and thoroughly organize a subset of the Internet relevant to one’s interests.(...)
Technical Area 1: Domain-Specific IndexingErm... Maltego ?
Crawling should also be robust to automated counter-crawling measures, crawler bans based on robot behavior, human detection, paywalls and member-only areas, forms, dynamic and non-HTML content, etc.
Information extraction may include normalization of heterogeneous data, natural language processing for translation and entity extraction and disambiguation, image analysis for object recognition, coreference resolution, extraction of multimedia (e.g., pdf, flash, video, image), relevance determination, etc.
Technical Area 2: Domain-Specific Search
Technical Area 2 includes the creation of a configurable domain-specific interface into web content. The domain-specific interface may include: conceptually aggregated results, e.g., a person; conceptually connected content, e.g., links for shared attributes; task relevant facets, e.g., key locations, entity movement; implicit collaboration for enriched content; explicit collaboration with shared tags; recommendations based on user model and augmented index, etc.
Also, TA2 performers will work with TA1 performers on the design of a query language for directing crawlers and information extraction algorithms. A language to specify the domain, including both crawling as well as interface capability, may include concepts, sets of keywords, time delimitations, area delimitations, IP ranges, computational budgets, semi-automated feedback, iterative methods, data models, etc. Technical Area 3: Applications"Human Trafficking". Cool, no one can argue that it's not a legit one. I personally think the use case is relevant. Another one is money laundering, because it is very often bound to human trafficking and slavery. I suppose they are many other use cases that would benefit from such project.
Human Trafficking, especially for the commercial sex trade, is a line of business with significant web presence to attract customers and is relevant to many types of military, law enforcement, and intelligence investigations. The use of forums, chats, advertisements, job postings, hidden services, etc., continue to enable a growing industry of modern slavery. An index curated for the counter trafficking domain, including labor and sex trafficking, along with configurable interfaces for search and analysis will enable a new opportunity for military, law enforcement, legal, and intelligence actions to be taken against trafficking enterprises.
Other application domains will be considered during the life of the program, possibly including indexing and interfaces for found data, missing persons, counterfeit goods, etc.
Since technology development will be guided by end-users with operational support expertise, DARPA will engage elements of the DoD and other agencies to develop use cases and operational concepts around Human Trafficking and other domains.
Others? Circumstances? A bit vague...
2. Foreign Participation
Non-U.S. organizations and/or individuals may participate to the extent that such participants comply with any necessary nondisclosure agreements, security regulations, export control laws, and other governing statutes applicable under the circumstances.
D. Other Eligibility Requirements
1. Ability to Support Classified Development, Integration and Transition
While the program itself is unclassified, interactions with end-users and potential transition partners will require Technical Area 3 performers to have access to classified information. Therefore, at the time of award, all prime proposers to Technical Area 3 must have (at a minimum) Secret facility clearance and have personnel under their Commercial and Government Entity (CAGE) code with a Secret clearance that are eligible for TS/SCI. Technical Area 3 proposers must provide their CAGE code and security point(s) of contact in their proposals.
(...)
Proposers for Technical Areas 1 and 2 are not required to have security clearances.
If I am running a web server and configured this one not to be indexed (robot.txt), it is because I deliberately chose to do so ! If I am a member of a private or by-invitation only forum it is maybe because I am trying to dissociate my private and professional life. They are many other genuine examples I could mention.Crawling should also be robust to automated counter-crawling measures, crawler bans based on robot behavior, human detection, paywalls and member-only areas, forms, dynamic and non-HTML content, etc.
- python-dateutil
- capinfos
- Update the CAP_FOLDER global variable to reflect your own setup
- CAP_FOLDER is the place where tcpdump capture files are stored.
CAP_FOLDER example (no trailing / slash at the end please):
CAP_FOLDER = '/var/log/pcap'
The -v argument switch (-v output in blue)
raskal$ ./digpcap.py -v -f "Dec. 11, 2013 14:40:00"As you can mention, the -v argument produces more output not suitable for further piping and processing. Use it to check that the script is not messing around.
Last example: searching for capture files stored between November the 4th at 11:30am and November the 5th at 3:46:50am (no -v switch this time).
raskal$ ./digpcap.py -f "Nov. 4, 2013 11:30" -t "Nov. 5, 2013 3:46:50" Note: at midnight, a cron job is restarting the tcpdump process thus dump##### numbering is restarting afresh.
Without -v you can further process each file one by one, your imagination is the limit.(Note: no need to enter the time in HH:MM:SS format, HH is enough)
raskal$ for f in `./digpcap.py -f "Aug. 23, 2012 14" -t "Aug. 24, 2012 14"`; do echo "Look Ma, got " $f; done
Look Ma, got 2012/08/23/ipv4.pcap
Look Ma, got 2012/08/23/ipv6.pcap
Look Ma, got 2012/08/24/fragment.pcap
Look Ma, got 2012/08/24/link.pcap
The script... digpcap.py (version 0.3)